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Chapter 1 


Introduction 


The AMD64)| architecturd?|is an extension of the x86 architecture. Any processor 
implementing the AMD64 architecture specification will also provide compatibil- 
ity modes for previous descendants of the Intel 8086 architecture, including 32-bit 
processors such as the Intel 386, Intel Pentium, and AMD K6-2 processor. Oper- 
ating systems conforming to the AMD64 ABI may provide support for executing 
programs that are designed to execute in these compatibility modes. The AMD64 
ABI does not apply to such programs; this document applies only to programs 
running in the “long” mode provided by the AMD64 architecture. 

Except where otherwise noted, the AMD64 architecture ABI follows the con- 
ventions described in the Intel386 ABI. Rather than replicate the entire contents 
of the Intel386 ABI, the AMD64 ABI indicates only those places where changes 
have been made to the Intel386 ABI. 

No attempt has been made to specify an ABI for languages other than C. How- 
ever, it is assumed that many programming languages will wish to link with code 
written in C, so that the ABI specifications documented here apply there too} 


'AMD64 has been previously called x86-64. The latter name is used in a number of places out 
of historical reasons instead of AMD64. 


The architecture specification is available on the web at http: //www.x86-64.org/ 


See section for details on C++ ABI. 
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Chapter 2 


Software Installation 


This document does not specify how software must be installed on an AMD64 
architecture machine. 
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Chapter 3 


Low Level System Information 


3.1 Machine Interface 


3.1.1 Processor Architecture 


3.1.2 Data Representation 


Within this specification, the term byte refers to a 8-bit object, the term twobyte 
refers to a 16-bit object, the term fourbyte refers to a 32-bit object, the term eight- 
byte refers to a 64-bit object, and the term sixteenbyte refers to a 128-bit object]!] 


Fundamental Types 


Figure shows the correspondence between ISO C’s scalar types and the pro- 
cessor’s. __ int128, float128,__m64,__m128 and __m256 types are 
optional. 

The __ float 128 type uses a 15-bit exponent, a 113-bit mantissa (the high 
order significant bit is implicit) and an exponent bias of 16383)7| 

The long double type uses a 15 bit exponent, a 64-bit mantissa with an ex- 
plicit high order significant bit and an exponent bias of 16383/}| Although a long 


'The Intel386 ABI uses the term halfword for a 16-bit object, the term word for a 32-bit object, 
the term doubleword for a 64-bit object. But most IA-32 processor specific documentation define 
a word as a 16-bit object, a doubleword as a 32-bit object, a quadword as a 64-bit object and a 
double quadword as a 128-bit object. 

“Initial implementations of the AMD64 architecture are expected to support operations on the 
__float128 type only via software emulation. 

3This type is the x87 double extended precision data type. 
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Figure 3.1: Scalar Types 


Alignment AMD64 
Type C sizeof (bytes) Architecture 
_Boolt 1 1 boolean 
char 1 1 signed byte 
signed char 
unsigned char 1 1 unsigned byte 
short 2 2 signed twobyte 
signed short 
unsigned short 2 2 unsigned twobyte 
int 4 4 signed fourbyte 
Integral signed int 
enumltt 
unsigned int 4 4 unsigned fourbyte 
long 8 8 signed eightbyte 
signed long 
long long 
signed long long 
unsigned long 8 8 unsigned eightbyte 
unsigned long long 8 8 unsigned eightbyte 
__inti2eft 16 16 signed sixteenbyte 
signed __int12eit 16 16 signed sixteenbyte 
unsigned __int128f7 16 16 unsigned sixteenbyte 
Pointer any-type * 8 8 unsigned eightbyte 
any-type (*) () 
Floating- | float 4 4 single (IEEE-754) 
point double 8 8 double (IEEE-754) 
long double 16 16 80-bit extended (IEEE-754) 
__float12ett 16 16 128-bit extended (IEEE-754) 
Decimal- | _Decimal32 4 4 32bit BID (IEEE-754R) 
floating- | _Decimal64 8 8 64bit BID (IEEE-754R) 
point _Decimal128 16 16 128bit BID (IEEE-754R) 
Packed __ m64it 8 8 MMxX and 3DNow! 
__mi2el! 16 16 SSE and SSE-2 
__ m256!! 32 32 AVX 


' This type is called bool in C++. 
'T These types are optional. 

tT C++ and some implementations of C permit enums larger than an int. The underlying 
type is bumped to an unsigned int, long int or unsigned long int, in that order. 
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double requires 16 bytes of storage, only the first 10 bytes are significant. The 
remaining six bytes are tail padding, and the contents of these bytes are undefined. 

The __int128 type is stored in little-endian order in memory, i.e., the 64 
low-order bits are stored at a a lower address than the 64 high-order bits. 

A null pointer (for all types) has the value zero. 

The type size_t is defined as unsigned long. 

Booleans, when stored in a memory object, are stored as single byte objects the 
value of which is always 0 (false) or 1 (true). When stored in integer registers 
(except for passing as arguments), all 8 bytes of the register are significant; any 
nonzero value is considered t rue. 

Like the Intel386 architecture, the AMD64 architecture in general does not re- 
quire all data accesses to be properly aligned. Misaligned data accesses are slower 
than aligned accesses but otherwise behave identically. The only exceptions are 
that m128 and__m256 must always be aligned properly. 


Aggregates and Unions 


Structures and unions assume the alignment of their most strictly aligned compo- 
nent. Each member is assigned to the lowest available offset with the appropriate 
alignment. The size of any object is always a multiple of the object‘s alignment. 

An atray uses the same alignment as its elements, except that a local or global 
array variable of length at least 16 bytes or a C99 variable-length array variable 
always has alignment of at least 16 bytesf{| 

Structure and union objects can require padding to meet size and alignment 
constraints. The contents of any padding is undefined. 


Bit-Fields 


C struct and union definitions may include bit-fields that define integral values of 
a specified size. 

The ABI does not permit bit-fields having the type__m64,__ m128 or__m256. 
Programs using bit-fields of these types are not portable. 

Bit-fields that are neither signed nor unsigned always have non-negative val- 
ues. Although they may have type char, short, int, or long (which can have neg- 


4The alignment requirement allows the use of SSE instructions when operating on the array. 
The compiler cannot in general calculate the size of a variable-length array (VLA), but it is ex- 
pected that most VLAs will require at least 16 bytes, so it is logical to mandate that VLAs have at 
least a 16-byte alignment. 
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Figure 3.2: Bit-Field Ranges 


Bit-field Type Width w Range 
signed char me ae (8 0 cee | 
char 1to8 Oto 2” —1 
unsigned char Oto 2” —1 
signed short ee nee | 
short 1 to 16 Oto 2”-—1 
unsigned short Oto 2” —1 
signed int ee se | 
int 1 to 32 Oto 2” —1 
unsigned int Oto 2” —1 
signed long ee pee | 
long 1 to 64 Oto 2” —1 
unsigned long Oto 2”—1 


ative values), these bit-fields have the same range as a bit-field of the same size 
with the corresponding unsigned type. Bit-fields obey the same size and alignment 
rules as other structure and union members. 

Also: 


e bit-fields are allocated from right to left 


e bit-fields must be contained in a storage unit appropriate for its declared 
type 


e bit-fields may share a storage unit with other struct / union members 


Unnamed bit-fields’ types do not affect the alignment of a structure or union. 


3.2 Function Calling Sequence 


This section describes the standard function calling sequence, including stack 
frame layout, register usage, parameter passing and so on. 

The standard calling sequence requirements apply only to global functions. 
Local functions that are not reachable from other compilation units may use dif- 
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ferent conventions. Nevertheless, it is recommended that all functions use the 
standard calling sequence when possible. 


3.2.1 Registers and the Stack Frame 


The AMD64 architecture provides 16 general purpose 64-bit registers. In addition 
the architecture provides 16 SSE registers, each 128 bits wide and 8 x87 floating 
point registers, each 80 bits wide. Each of the x87 floating point registers may be 
referred to in MMX/3DNow! mode as a 64-bit register. All of these registers are 
global to all procedures active for a given thread. 

Intel AVX (Advanced Vector Extensions) provides 16 256-bit wide AVX reg- 
isters (SymmO - Symm15). The lower 128-bits of SymmO - %ymm15 are aliased 
to the respective 128b-bit SSE registers (¢xmm0O - $xmm15). For purposes of pa- 
rameter passing and function return, 3xmmN and %ymmN refer to the same register. 
Only one of them can be used at the same time. We use vector register to refer to 
either SSE or AVX register. 

This subsection discusses usage of each register. Registers srbp, Srbx and 
$r12 through %r15 “belong” to the calling function and the called function is 
required to preserve their values. In other words, a called function must preserve 
these registers’ values for its caller. Remaining registers “belong” to the called 
function}>| If a calling function wants to preserve such a register value across a 
function call, it must save the value in its local stack frame. 

The CPU shall be in x87 mode upon entry to a function. Therefore, every 
function that uses the MMX registers is required to issue an emms or femms 
instruction after using MMxX registers, before returning or calling another function. 
fF The direction flag DF in the srF LAGS register must be clear (set to “forward” 
direction) on function entry and return. Other user flags have no specified role in 
the standard calling sequence and are not preserved across calls. 

The control bits of the MXCSR register are callee-saved (preserved across 
calls), while the status bits are caller-saved (not preserved). The x87 status word 
register is caller-saved, whereas the x87 control word is callee-saved. 


5Note that in contrast to the Intel386 ABI, rdi, and $rsi belong to the called function, not 
the caller. 

©All x87 registers are caller-saved, so callees that make use of the MMX registers may use the 
faster femms instruction. 
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Figure 3.3: Stack Frame with Base Pointer 


Position Contents Frame 
8n+16(%rbp) | memory argument eightbyte n 
ss Previous 
16(%rbp) | memory argument eightbyte 0 
8 (Srbp) return address 
0 (%rbp) previous Srbp value 
=8(SErbp) unspecified Current 
0(%rsp) variable size 
+128 (Srsp) red zone 


3.2.2 The Stack Frame 


In addition to registers, each function has a frame on the run-time stack. This stack 
grows downwards from high addresses. Figure[3.3|shows the stack organization. 

The end of the input argument area shall be aligned on a 16 (32, if ___m256 is 
passed on stack) byte boundary. In other words, the value (Srsp + 8) is always 
a multiple of 16 (32) when control is transferred to the function entry point. The 
stack pointer, rsp, always points to the end of the latest allocated stack frame. 

The 128-byte area beyond the location pointed to by %rsp is considered to 
be reserved and shall not be modified by signal or interrupt handlers|| Therefore, 
functions may use this area for temporary data that is not needed across function 
calls. In particular, leaf functions may use this area for their entire stack frame, 
rather than adjusting the stack pointer in the prologue and epilogue. This area is 
known as the red zone. 


The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using 
%rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in 
the prologue and epilogue and makes one additional general-purpose register (Srbp) available. 

8Locations within 128 bytes can be addressed using one-byte displacements. 
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3.2.3 Parameter Passing 


After the argument values have been computed, they are placed either in regis- 
ters or pushed on the stack. The way how values are passed is described in the 
following sections. 


Definitions We first define a number of classes to classify arguments. The 
classes are corresponding to AMD64 register classes and defined as: 


INTEGER This class consists of integral types that fit into one of the general 
purpose registers. 


SSE The class consists of types that fit into a vector register. 


SSEUP The class consists of types that fit into a vector register and can be passed 
and returned in the upper bytes of it. 


X87, X87UP These classes consists of types that will be returned via the x87 
FPU. 


COMPLEX_X87 This class consists of types that will be returned via the x87 
FPU. 


NO_CLASS This class is used as initializer in the algorithms. It will be used for 
padding and empty structures and unions. 


MEMORY This class consists of types that will be passed and returned in mem- 
ory via the stack. 


Classification The size of each argument gets rounded up to eightbytes)?| 
The basic types are assigned their natural classes: 


e Arguments of types (signed and unsigned) _Bool, char, short, int, 
long, Long long, and pointers are in the INTEGER class. 


e Arguments of types float, double, Decimal32,_ Decimalé64 and 
__m64 are in class SSE. 


Therefore the stack will always be eightbyte aligned. 
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e Arguments of types__ float128,_ Decimal128 and__m128 are split 
into two halves. The least significant ones belong to class SSE, the most 
significant one to class SSEUP. 


e Arguments of type __m256 are split into four eightbyte chunks. The least 
significant one belongs to class SSE and all the others to class SSEUP. 


The 64-bit mantissa of arguments of type Long double belongs to class 
X87, the 16-bit exponent plus 6 bytes of padding belongs to class X87UP. 


e Arguments of type __int128 offer the same operations as INTEGERs, 
yet they do not fit into one general purpose register but require two registers. 
For classification purposes ___ int 128 is treated as if it were implemented 
as: 


typedef struct { 
long low, high; 
} _ int128; 


with the exception that arguments of type __int128 that are stored in 
memory must be aligned on a 16-byte boundary. 


e Arguments of complex T where Tis oneofthetypes float or double 
are treated as if they are implemented as: 


struct complexT { 
T real; 
T imag; 


}; 


e A variable of type complex long double is classified as type COM- 
PLEX_X87. 


The classification of aggregate (structures and arrays) and union types works 
as follows: 


1. If the size of an object is larger than four eightbytes, or it contains unaligned 
fields, it has class MEMORY [>| 


'0The post merger clean up described later ensures that, for the processors that do not support 
the __m256 type, if the size of an object is larger than two eightbytes and the first eightbyte is not 
SSE or any other eightbyte is not SSEUP, it still has class MEMORY. This in turn ensures that for 
processors that do support the __m256 type, if the size of an object is four eightbytes and the first 
eightbyte is SSE and all other eightbytes are SSEUP, it can be passed in a register. 
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2. If a C++ object has either a non-trivial copy constructor or a non-trivial 
destructor|!"| it is passed by invisible reference (the object is replaced in the 
parameter list by a pointer that has class INTEGER) 


3. If the size of the aggregate exceeds a single eightbyte, each is classified 
separately. Each eightbyte gets initialized to class NO_CLASS. 


4. Each field of an object is classified recursively so that always two fields are 
considered. The resulting class is calculated according to the classes of the 
fields in the eightbyte: 


(a) If both classes are equal, this is the resulting class. 


(b) If one of the classes is NO_CLASS, the resulting class is the other 
class. 


(c) If one of the classes is MEMORY, the result is the MEMORY class. 
(d) If one of the classes is INTEGER, the result is the INTEGER. 


(e) If one of the classes is X87, X87UP, COMPLEX_X87 class, MEM- 
ORY is used as class. 


(f) Otherwise class SSE is used. 
5. Then a post merger cleanup is done: 


(a) If one of the classes is MEMORY, the whole argument is passed in 
memory. 

(b) If X87UP is not preceded by X87, the whole argument is passed in 
memory. 

(c) If the size of the aggregate exceeds two eightbytes and the first eight- 
byte isn’t SSE or any other eightbyte isn’t SSEUP, the whole argument 
is passed in memory. 


"4 de/constructor is trivial if it is an implicitly-declared default de/constructor and if: 
e its class has no virtual functions and no virtual base classes, and 
e all the direct base classes of its class have trivial de/constructors, and 


e for all the nonstatic data members of its class that are of class type (or array thereof), each 
such class has a trivial de/constructor. 


An object with either a non-trivial copy constructor or a non-trivial destructor cannot be 


passed by value because such objects must have well defined addresses. Similar issues apply 
when returning an object from a function. 
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(d) If SSEUP is not preceded by SSE or SSEUP, it is converted to SSE. 


Passing Once arguments are classified, the registers get assigned (in left-to-right 
order) for passing as follows: 


1. If the class is MEMORY, pass the argument on the stack. 


2. If the class is INTEGER, the next available register of the sequence rdi, 
Srsi, srdx, *rcx, $r8 and $r9 is used!] 


3. If the class is SSE, the next available vector register is used, the registers 
are taken in the order from %xmm0O to $xmm7. 


4. If the class is SSEUP, the eightbyte is passed in the next available eightbyte 
chunk of the last used vector register. 


5. If the class is X87, X87UP or COMPLEX_X87, it is passed in memory. 


When a value of type _Boo 1 is returned or passed in a register or on the stack, 
bit O contains the truth value and bits 1 to 7 shall be zerd!4| 

If there are no registers available for any eightbyte of an argument, the whole 
argument is passed on the stack. If registers have already been assigned for some 
eightbytes of such an argument, the assignments get reverted. 

Once registers are assigned, the arguments passed in memory are pushed on 
the stack in reversed (right-to-lef(">) order. 

For calls that may call functions that use varargs or stdargs (prototype-less 
calls or calls to functions containing ellipsis (...) in the declaration) % al is used 
as hidden argument to specify the number of vector registers used. The contents 


Note that $r11 is neither required to be preserved, nor is it used to pass arguments. Making 
this register available as scratch register means that code in the PLT need not spill any registers 
when computing the address to which control needs to be transferred. %rax is used to indicate the 
number of vector arguments passed to a function requiring a variable number of arguments. %r10 
is used for passing a function’s static chain pointer. 

'4Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 
or | when truncated to 8 bit. 

'Right-to-left order on the stack makes the handling of functions that take a variable number 
of arguments simpler. The location of the first argument can always be computed statically, based 
on the type of that argument. It would be difficult to compute the address of the first argument if 
the arguments were pushed in left-to-right order. 

'©Note that the rest of Srax is undefined, only the contents of a1 is defined. 
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Figure 3.4: Register Usage 


Preserved across 


Register Usage function calls 

S$rax temporary register; with variable arguments | No 
passes information about the number of vector 
registers used; 1** return register 

Srbx callee-saved register; optionally used as base | Yes 
pointer 

SCX used to pass 4'* integer argument to functions No 

Srdx used to pass 3'¢ argument to functions; 2"¢ return | No 
register 

Srsp stack pointer Yes 

Srbp callee-saved register; optionally used as frame | Yes 
pointer 

BES. used to pass 2"¢ argument to functions No 

Srdi used to pass 1** argument to functions No 

Sr8 used to pass 5‘* argument to functions No 

SLO used to pass 6** argument to functions No 

$r10 temporary register, used for passing a function’s | No 
static chain pointer 

ce lest temporary register No 

SE Lg or is callee-saved registers Yes 

$xmmO0—%xmm1 | used to pass and return floating point arguments | No 

$xmm2—-%Sxmm7 | used to pass floating point arguments No 

$xmm8—%xmm15 | temporary registers No 

SmmxO0—%Smmx7 | temporary registers No 

sst0,sstl temporary registers; used to return long | No 
double arguments 

$st2-Sst7 temporary registers No 

Sfs Reserved for system (as thread specific data reg- | No 
ister) 

mxCSLr SSE2 control and status word partial 

x87 SW x87 status word No 

x87 CW x87 control word Yes 
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of a1 do not need to match exactly the number of registers, but must be an upper 
bound on the number of vector registers used and is in the range 0-8 inclusive. 

When passing __m256 arguments to functions that use varargs or stdarg, 
function prototypes must be provided. Otherwise, the run-time behavior is un- 
defined. 


Returning of Values The returning of values is done according to the following 


algorithm: 

1. Classify the return type with the classification algorithm. 

2. If the type has class MEMORY, then the caller provides space for the return 
value and passes the address of this storage in Srdi as if it were the first 
argument to the function. In effect, this address becomes a “hidden” first ar- 
gument. This storage must not overlap any data visible to the callee through 
other names than this argument. 

On return %rax will contain the address that has been passed in by the 
caller in Srdi. 

3. If the class is INTEGER, the next available register of the sequence %rax, 
%rdx is used. 

4. If the class is SSE, the next available vector register of the sequence xmm0O, 
%xmm1 is used. 

5. If the class is SSEUP, the eightbyte is returned in the next available eightbyte 
chunk of the last used vector register. 

6. If the class is X87, the value is returned on the X87 stack in st 0 as 80-bit 
x87 number. 

7. If the class is X87UP, the value is returned together with the previous X87 
value in st0. 

8. If the class is COMPLEX_X87, the real part of the value is returned in 


%st0O and the imaginary part in Sst1. 


As an example of the register passing conventions, consider the declarations 
and the function call shown in Figure The corresponding register allocation 
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is given in Figure [3.6] the stack frame offset given shows the frame before calling 
the function. 


Figure 3.5: Parameter Passing Example 


typedef struct { 
int a, b; 
double d; 
} structparm; 
structparm s; 
Int €;, £2, GO, Ny 2, Jy uke 
long double ld; 
double m, n; 
—_ m256 y; 


extern void func (int e, int f, 
structparm s, int g, int h, 
long double ld, double m, 
_— m256 y, 


double n, int i, int Jj, int k); 


func (e, f, s, g, h, ld, m, y, n, i, J, k); 


Figure 3.6: Register Allocation Example 


General Purpose Registers Floating Point Registers Stack Frame Offset 


Srdi: e Sxmm0: s.d O: ld 
Srsi: f Sxmml: m 16: j 
Srdx: S.a,s.b Symm2: y 24: k 
SrCcx: g Sxmm3: n 

Sxr8: h 

Sr9: 1: 
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3.3. Operating System Interface 


3.3.1 Exception Interface 


As the AMD64 manuals describe, the processor changes mode to handle excep- 
tions, which may be synchronous, floating-point/coprocessor or asynchronous. 
Synchronous and floating-point/coprocessor exceptions, being caused by instruc- 
tion execution, can be explicitly generated by a process. This section, therefore, 
specifies those exception types with defined behavior. The AMD64 architecture 
classifies exceptions as faults, traps, and aborts. See the Intel386 ABI for more 
information about their differences. 


Hardware Exception Types 


The operating system defines the correspondence between hardware exceptions 
and the signals specified by signal (BA_OS) as shown in table B.1] Contrary 
to the 1386 architecture, the AMD64 does not define any instructions that generate 
a bounds check fault in long mode. 


3.3.2 Virtual Address Space 


Although the AMD64 architecture uses 64-bit pointers, implementations are only 
required to handle 48-bit addresses. Therefore, conforming processes may only 
use addresses from 0x00000000 00000000 to OxO0007££FE ffffree | 

Processes begin with three logical segments, commonly called text, data, and 
stack. Use of shared libraries add other segments and a process may dynamically 
create segments. 


3.3.3 Page Size 
Systems are permitted to use any power-of-two page size between 4KB and 64KB, 


inclusive. 


3.3.4 Virtual Address Assignments 


Conceptually processes have the full address space available. In practice, how- 
ever, several factors limit the size of a process. 


'70xOOOOff ff fFFFFFFF is not a canonical address and cannot be used. 
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Table 3.1: Hardware Ex 


ceptions and Signals 


Number Exception name Signal 
0 divide error fault SIGFPE 
1 Single step trap/fault SIGTRAP 
2 non-maskable interrupt none 
3 breakpoint trap SIGTRAP 
4 overflow trap SIGSEGV 
5 (reserved) 
6 invalid opcode fault SIGILL 
if no coprocessor fault SIGFPE 
8 double fault abort none 
9 coprocessor overrun abort SIGSEGV 
10 invalid TSS fault none 
11 segment no present fault none 
12 stack exception fault SIGSEGV 
13 general protection fault/abort | SIGSEGV 
14 page fault SIGSEGV 
1.5 (reserved) 
16 coprocessor error fault SIGFPE 
other | (unspecified) SIGILL 


Table 3.2: Floating-Point Exceptions 


Code 


Reason 


FPE_FLTDIV 
FPE_FLTOVF 


FPE_FLTRES 


floating-point divide by zero 

floating-point overflow 
FPE_FLTUND | floating-point underflow 

floating-point inexact result 


FPE_FLTINV | invalid floating-point operation 
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e The system reserves a configuration dependent amount of virtual space. 
e The system reserves a configuration dependent amount of space per process. 


e A process whose size exceeds the system’s available combined physical 
memory and secondary storage cannot run. Although some physical mem- 
ory must be present to run any process, the system can execute processes 
that are bigger than physical memory, paging them to and from secondary 
storage. Nonetheless, both physical memory and secondary storage are 
shared resources. System load, which can vary from one program execu- 
tion to the next, affects the available amount. 


Programs that dereference null pointers are erroneous and a process should 
not expect Ox0 to be a valid address. 


Figure 3.7: Virtual Address Configuration 


OxfffffrffffffffffFf | Reserved system area | End of memory 


0x80000000000 | Dynamic segments 


0 Process segments Beginning of memory 


Although applications may control their memory assignments, the typical ar- 
rangement appears in figure|3.8 
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Figure 3.8: Conventional Segment Arrangements 


0x80000000000 | Dynamic segments 
Stack segment 


Data segments 


0x400000 Text segments 
0 Unmapped 


3.4 Process Initialization 


3.4.1 Initial Stack and Register State 
Special Registers 


The AMD64 architecture defines floating point instructions. At process startup 
the two floating point units, SSE2 and x87, both have all floating-point exception 
status flags cleared. The status of the control words is as defined in tables |3.3]and 


Table 3.3: x87 Floating-Point Control Word 


Field Value Note 

RC 0 Round to nearest 

PEC 11 Double extended precision 
PM 1 Precision masked 

UM 1 Underflow masked 

OM 1 Overflow masked 

ZM 1 Zero divide masked 

DM 1 De-normal operand masked 
IM 1 Invalid operation masked 
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Table 3.4: MXCSR Status Bits 


Field Value 


Note 


FZ 0 
RC 0 
PM 1 
UM 1 
OM 1 
ZM 1 
DM 1 
IM 1 
DAZ | 0 


Do not flush to zero 

Round to nearest 

Precision masked 
Underflow masked 
Overflow masked 

Zero divide masked 
De-normal operand masked 
Invalid operation masked 
De-normals are not zero 


The rF LAGS register contains the system flags, such as the direction flag and 
the carry flag. The low 16 bits (FLAGS portion) of rF LAGS are accessible by 
application software. The state of them at process initialization is shown in table 


Table 3.5: rF LAGS Bits 
Field Value Note 
DF 0 Direction forward 
CF 0 No carry 
PF 0 Even parity 
AF 0 No auxiliary carry 
ZF 0 No zero result 
Sr 0 Unsigned result 
OF 0) No overflow occurred 
Stack State 


This section describes the machine state that exec (BA_OS) creates for new 
processes. Various language implementations transform this initial program state 
to the state required by the language standard. 


28 


AMD64 ABI Draft 0.99.6 — October 7, 2013 — 10:35 


For example, a C program begins executing at a function named main de- 
clared as: 


extern int main ( int argc , char xargv[ ] , char* envp[ ] ); 


where 
arge is a non-negative argument count 
argv is an array of argument strings, with argv[argc] == 0 


envp is an array of environment strings, terminated by a null pointer. 


When main () returns its value is passed to exit () and if that has been 
over-ridden and returns, _¢xit () (which must be immune to user interposition). 
The initial state of the process stack, i.e. when __start is called is shown in 


figure 


Figure 3.9: Initial Process Stack 


Start Address 
High Addresses 


Purpose Length 
Unspecified 

Information block, including argu- 
ment strings, environment strings, 
auxiliary information ... 
Unspecified 


Null auxiliary vector entry 


varies 


1 eightbyte 


Auxiliary vector entries ... 2 eightbytes each 
0 eightbyte 
Environment pointers ... 1 eightbyte each 
0 8+8xargct+Srsp | eightbyte 
Argument pointers 8+%Srsp argc eightbytes 
Argument count Srsp eightbyte 
Undefined Low Addresses 


Argument strings, environment strings, and the auxiliary information appear 
in no specific order within the information block and they need not be compactly 


allocated. 


Only the registers listed below have specified values at process entry: 
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%rbp The content of this register is unspecified at process initialization time, 
but the user code should mark the deepest stack frame by setting the frame 
pointer to zero. 


%xsp The stack pointer holds the address of the byte with lowest address which 
is part of the stack. It is guaranteed to be 16-byte aligned at process entry. 


%rdx afunction pointer that the application should register with atexit (BA_OS). 


It is unspecified whether the data and stack segments are initially mapped with 
execute permissions or not. Applications which need to execute code on the stack 
or data segments should take proper precautions, e.g., by calling mprotect (). 


3.4.2. Thread State 


New threads inherit the floating-point state of the parent thread and the state is 
private to the thread thereafter. 


3.4.3 Auxiliary Vector 


The auxiliary vector is an array of the following structures (ref. figure |3.10), 
interpreted according to the a_t ype member. 


Figure 3.10: auxv_t Type Definition 


typedef struct 
{ 
int a_type; 
union { 
long a_val; 
void *xa_ptr; 
void (xa_fnc) (); 
} aun; 
} auxv_t; 


The AMD64 ABI uses the auxiliary vector types defined in figure 
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Figure 3.11: Auxiliary Vector Types 


Name Value a_un 
AT_NULL 0 | ignored 
AT_IGNORE 1 | ignored 
AT_EXECFD 2) a_val 
AT _PHDR 3 | a_ptr 
AT_PHENT 4) a_va 
AT_PHNUM 5 | a_va 
AT_PAGESZ 6 | a_va 
AT_BASE 7) a ptr 
AT_FLAGS 8 | a_va 
AT_ENTRY 9| a_ptr 
AT_NOTELF 10 | a_va 
AT_UID 11) a_va 
AT_EUID 12) a_va 
AT_GID 13 | a_va 
AT_EGID 14 a_va 


AT_NULL The auxiliary vector has no fixed length; instead its last entry’s a_t ype 
member has this value. 


AT_IGNORE This type indicates the entry has no meaning. The corresponding 
value of a_un 1s undefined. 


AT_EXECEFD At process creation the system may pass control to an interpreter 
program. When this happens, the system places either an entry of type 
AT_EXECEFD or one of type AT_PHDR in the auxiliary vector. The entry 
for type AT_EXECFD uses the a_val member to contain a file descriptor 
open to read the application program’s object file. 


AT_PHDR The system may create the memory image of the application program 
before passing control to the interpreter program. When this happens, the 
a_jptr member of the AT_PHDR entry tells the interpreter where to find 
the program header table in the memory image. 
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AT_PHENT The a_val member of this entry holds the size, in bytes, of one 
entry in the program header table to which the AT_PHDR entry points. 


AT_PHNUM The a_val member of this entry holds the number of entries in 
the program header table to which the AT_PHDR entry points. 


AT_PAGESZ If present, this entry’s a_val member gives the system page size, 
in bytes. 


AT_BASE The a_ptr member of this entry holds the base address at which the 
interpreter program was loaded into memory. See “Program Header” in the 
System V ABI for more information about the base address. 


AT_FLAGS If present, the a_val member of this entry holds one-bit flags. Bits 
with undefined semantics are set to zero. 


AT_ENTRY The a_ptr member of this entry holds the entry point of the appli- 
cation program to which the interpreter program should transfer control. 


AT_NOTELF The a_val member of this entry is non-zero if the program is in 
another format than ELF. 


AT_UID The a_val member of this entry holds the real user id of the process. 


AT_EUID The a_val member of this entry holds the effective user id of the 
process. 


AT_GID The a_val member of this entry holds the real group id of the process. 


AT_EGID The a_val member of this entry holds the effective group id of the 
process. 


3.5 Coding Examples 


This section discusses example code sequences for fundamental operations such 
as calling functions, accessing static objects, and transferring control from one 
part of a program to another. Unlike previous material, this material is not norma- 
tive. 
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3.5.1 Architectural Constraints 


The AMD64 architecture usually does not allow an instruction to encode arbitrary 
64-bit constants as immediate operand. Most instructions accept 32-bit immedi- 
ates that are sign extended to the 64-bit ones. Additionally the 32-bit operations 
with register destinations implicitly perform zero extension making loads of 64-bit 
immediates with upper half set to 0 even cheaper. 

Additionally the branch instructions accept 32-bit immediate operands that are 
sign extended and used to adjust the instruction pointer. Similarly an instruction 
pointer relative addressing mode exists for data accesses with equivalent limita- 
tions. 

In order to improve performance and reduce code size, it is desirable to use 
different code models depending on the requirements. 

Code models define constraints for symbolic values that allow the compiler to 
generate better code. Basically code models differ in addressing (absolute versus 
position independent), code size, data size and address range. We define only a 
small number of code models that are of general interest: 


Small code model The virtual address of code executed is known at link time. 
Additionally all symbols are known to be located in the virtual addresses in 
the range from 0 to 22! — 224 — 1 or from 000000000 to Ox7ef f f f f 


This allows the compiler to encode symbolic references with offsets in the 
range from —(2°") to 24 or from 080000000 to 0201000000 directly in the 
sign extended immediate operands, with offsets in the range from 0 to 2°! — 
24 or from 0200000000 to 027 f000000 in the zero extended immediate 
operands and use instruction pointer relative addressing for the symbols 
with offsets in the range —(27*) to 274 or 0x f f000000 to 0701000000. 


This is the fastest code model and we expect it to be suitable for the vast 
majority of programs. 


Kernel code model The kernel of an operating system is usually rather small but 
runs in the negative half of the address space. So we define all symbols to 
be in the range from 2° — 23! to 2° — 24 or from Oxf f f f ff f f80000000 


to Oxf ff fff fff f000000. 


'8 The number 24 is chosen arbitrarily. It allows for all memory of objects of size up to 274 
or 16M bytes to be addressed directly because the base address of such objects is constrained to 
be less than 23! — 274 or 027000000. Without such constraint only the base address would be 
accessible directly, but not any offsetted variant of it. 
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This code model has advantages similar to those of the small model, but 
allows encoding of zero extended symbolic references only for offsets from 
23 to 23! + 274 or from 02780000000 to 0781000000. The range offsets 
for sign extended reference changes to 0 to 23! + 274 or 02700000000 to 
0281000000. 


Medium code model In the medium model, the data section is split into two 
parts — the data section still limited in the same way as in the small code 
model and the large data section having no limits except for available ad- 
dressing space. The program layout must be set in a way so that large data 
sections (. ldata, .lrodata, .1bss) come after the text and data sec- 
tions. 


This model requires the compiler to use movabs instructions to access 
large static data and to load addresses into registers, but keeps the advan- 
tages of the small code model for manipulation of addresses in the small 
data and text sections (specially needed for branches). 


By default only data larger than 65535 bytes will be placed in the large data 
section. 


Large code model The large code model makes no assumptions about addresses 
and sizes of sections. 


The compiler is required to use the movabs instruction, as in the medium 
code model, even for dealing with addresses inside the text section. Addi- 
tionally, indirect branches are needed when branching to addresses whose 
offset from the current instruction pointer is unknown. 


It is possible to avoid the limitation on the text section in the small and 
medium models by breaking up the program into multiple shared libraries, 
so this model is strictly only required if the text of a single function becomes 
larger than what the medium model allows. 


Small position independent code model (PIC) Unlike the previous models, the 
virtual addresses of instructions and data are not known until dynamic link 
time. So all addresses have to be relative to the instruction pointer. 


Additionally the maximum distance between a symbol and the end of an 
instruction is limited to 2°! —274—1 or Ox7ef f f f ff, allowing the compiler 
to use instruction pointer relative branches and addressing modes supported 
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by the hardware for every symbol with an offset in the range —(27*) to 274 
or 0x f fO00000 to 0201000000. 


Medium position independent code model (PIC) This model is like the previ- 


ous model, but similarly to the medium static model adds large data sections 
at the end of object files. 


In the medium PIC model, the instruction pointer relative addressing can 
not be used directly for accessing large static data, since the offset can ex- 
ceed the limitations on the size of the displacement field in the instruction. 
Instead an unwind sequence consisting of movabs, lea and add needs to 
be used. 


Large position independent code model (PIC) This model is like the previous 


model, but makes no assumptions about the distance of symbols. 


The large PIC model implies the same limitation as the medium PIC model 
regarding addressing of static data. Additionally, references to the global 
offset table and to the procedure linkage table and branch destinations need 
to be calculated in a similar way. Further the size of the text segment is 
allowed to be up to 16EB in size, hence similar restrictions apply to all 
address references into the text segments, including branches. 


3.5.2 Conventions 


In this document some special assembler symbols are used in the coding examples 
and discussion. They are: 


name@GOT: specifies the offset to the GOT entry for the symbol name 
from the base of the GOT. 


name@GOTPLT: specifies the offset to the GOT entry for the symbol name 
from the base of the GOT, implying that there is a corresponding PLT entry. 


name@GOTOFF: specifies the offset to the location of the symbol name 
from the base of the GOT. 


name@GOTPCREL: specifies the offset to the GOT entry for the symbol 
name from the current code location. 


name@PLT: specifies the offset to the PLT entry of symbol name from the 
current code location. 
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e name@PLTOFF: specifies the offset to the PLT entry of symbol name from 
the base of the GOT. 


e GLOBAL _OFFSET_TABLE_: specifies the offset to the base of the GOT 
from the current code location. 


3.5.3 Position-Independent Function Prologue 


In the small code model all addresses (including GOT entries) are accessible via 
the IP-relative addressing provided by the AMD64 architecture. Hence there is no 
need for an explicit GOT pointer and therefore no function prologue for setting it 
up is necessary. 

In the medium and large code models a register has to be allocated to hold 
the address of the GOT in position-independent objects, because the AMD64 ISA 
does not support an immediate displacement larger than 32 bits. 

As %xr15 is preserved across function calls, it is initialized in the function 
prolog to hold the GOT addresq"|for non-leaf functions which call other functions 
through the PLT. Other functions are free to use any other register. Throughout 
this document, %r15 will be used in examples. 


Figure 3.12: Position-Independent Function Prolog Code 
medium model: 


leag _GLOBAL_OFFSET_TABLE_ (%rip),%r15 GOTPC32 reloc 
large model: 
pushq $r15 # save Sr15 
leag 1f(Srip),%Sr11 # absolute %rip 
1: movabs $ GLOBAL _OFFSET_TABLE_,%r15 # offset to the GOT (R_X86_64_ GOTPC64) 
leag ($r11,%r15),%r15 # absolute address of the GOT 


For the medium model the GOT pointer is directly loaded, for the large model 
the absolute value of rip is added to the relative offset to the base of the GOT 


STF at code generation-time, it is determined that either no other functions are called (leaf 
functions), the called functions addresses can be resolved and are within 2GB, or no global data 
objects are referred to, it is not necessary to store the GOT address in 3x15 and the prolog code 
that initializes it may be omitted. 
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in order to obtain its absolute address (see figure|3.12). 


3.5.4 Data Objects 


This section describes only objects with static storage. Stack-resident objects are 
excluded since programs always compute their virtual address relative to the stack 
or frame pointers. 

Because only the movabs instruction uses 64-bit addresses directly, depend- 
ing on the code model either Srip-relative addressing or building addresses in 
registers and accessing the memory through the register has to be used. 

For absolute addresses % r ip-relative encoding can be used in the small model. 
In the medium model the movabs instruction has to be used for accessing ad- 
dresses. 

Position-independent code cannot contain absolute address. To access a global 
symbol the address of the symbol has to be loaded from the Global Offset Table. 
The address of the entry in the GOT can be obtained with a rip-relative instruc- 
tion in the small model. 
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Small models 


Figure 3.13: Absolute Load and Store (Small Model) 


extern int src[65536]; 
extern int dst[65536]; 
extern int x*ptr; 

static int lsrc[65536]; 
static int ldst[65536]; 
static int *lptr; 

dst [0] = src[0]; 

ptr = dst[0]; 

xptr = src[0]; 

ldst [0] = lsrc[0]; 

lptr = ldst; 

siptr = lsre[0); 


extern src 

.extern dst 

extern ptr 

. local lsrc 

.comm lsrc,262144,4 

. local ldst 

.comm ldst, 262144,4 

. local lptr 

.comm letr, 8,8 

-text 

mov Ll src(%rip), %eax 
mov Ll Seax, dst (Srip) 
movq Sdst, ptr (%Srip) 
movg ptr(%rip),%rax 
movl src(%rip), sedx 
movil Sedx, (%rax) 

mov L lsrce(%rip), %eax 
mov 1 Seax, dst (%rip) 
mMovg Sdst, lptr(%rip) 
Movg ptr (%rip),%rax 
mov Ll src(%rip), sedx 
mov 1 Sedx, (%rax) 
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Figure 3.14: Position-Independent Load and Store (Small PIC Model) 


extern int src[65536]; extern src 
extern int dst[65536]; extern dst 
extern int *ptr; extern ptr 
static int lsrc[65536]; ocal Ilsrc 
comm lsrc,262144,4 
static int ldst[65536]; oca ldst 
comm ldst,262144,4 
static int *lptr; oca lptr 
comm ptr ;.3: 28 
text 
dst [0] = src[0]; novg src@GOTPCREL(%Srip), %Srax 
movi (Srax), %edx 
novg dst@GOTPCREL(%rip), %rax 
movi Sedx, (%Srax) 
ptr = dst; novg ptr@GOTPCREL(%rip), %rax 
novg dst@GOTPCREL(%rip), %rdx 
novg Srdx, (%Srax) 
xptr = src[0]; novg ptr@GOTPCREL (%rip), rax 
novq (Srax), %rdx 
novg Ssrc@GOTPCREL(%Srip), %Srax 
movi (Srax), %eax 
movi Seax, (%Srdx) 
ldst [0] = lsrc[0]; mov 1 lsrce(%Srip), %eax 
mov 1 Seax, ldst (Srip) 
lptr = ldst; lea ldst (rip) , srdx 
movq Srdx, lptr(%Srip) 
*lptr = 1lsrc[0]; movg ptr (%rip),%rax 
movl lsre(%Srip) , sedx 
movl Sedx, (%Srax) 
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Medium models 


Figure 3.15: Absolute Load and Store (Medium Model) 


extern in 
extern in 
extern in 


static in 


static in 


static in 


src[65536]; 
dst [65536]; 
*ptr; 

lsrc[65536]; 


Ch Gh ach ct 


t ldst [65536]; 


ic ei pers 


-extern src 
.extern dst 
extern ptr 
local lsrc 
.comm Lsrc, 262144, 4% 
local ldst 
.comm ldst, 262144, 4 
local LptEe 
.comm lptr, 8,8 
-text 
movabsl src, %eax 
movabsl %eax, dst 
movabsq Sdst, %rdx 
movq Srdx, ptr 
movg ptr (%rip) , srdx 
movabsl src, %eax 
movil Seax, (%rdx) 
movabsl lsrc, %eax 
movabsl %eax, ldst 
movabsq Sldst, %rdx 
movabsq *rdx, lptr 
movg lptr (%rip),%rdx 
movabsl Ilsrc, %eax 
movil Seax, (%rdx) 
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Figure 3.16: 


Position-Independent Load and Store (Medium PIC Model) 


extern int src[65536]; 
extern int dst[65536]; 
extern int *ptr; 

static int lsrc[65536]; 


static int ldst[65536]; 


static int *lptr; 
dst [0] = src[0]; 
ptr = dst; 

xptr = src[0]; 


Srax 


Srax 


Srax 
Srdax 


Srax 


extern src 
extern dst 
extern ptr 
ocal lsrc 
comm lsrc,262144,4 
oca ldst 
comm ldst,262144,4 
oca lptr 
comm ptr ;.3: 28 
text 
novg src@GOTPCREL(%Srip), 
movi (Srax), %edx 
novg dst @GOTPCREL (%rip), 
movi Sedx, (%Srax) 
novg ptr@GOTPCREL(%Srip), 
novg dst @GOTPCREL (%rip), 
novg Srdx, (%Srax) 
novg ptr@GOTPCREL (%rip), rax 
novq (Srax), %rdx 
novg src@GOTPCREL(%Srip), 
movi (Srax), %eax 
movi Seax, (%Srdx) 
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Figure 3.17: Position-Independent Load and Store (Medium PIC Model), contin- 
ued 


ldst [0] = lsrc[0]; movabsq lsrc@GOTOFF64, %rax 
movl (Srax,Sr15), %eax 
movabsq ldst@GOTOFF64, %Srdx 
movl Seax, (srdx, 6r15) 

lptr = ldst; movabsq ldst@GOTOFF64, %Srax 
addg Sr15, Srax 
movq Srax, lptr(%Srip) 

*lptr = lsrc[0]; movabsq lsrc@GOTOFF64, %rax 
movl (Srax, Sr15),%eax 
movq lptr(%rip), srdx 
movl Seax, (srdx) 


Large Models 


Again, in order to access data at any position in the 64-bit addressing space, it is 
necessary to calculate the address explicitly?" not unlike the medium code model. 


21 Tf, at code generation-time, it is determined that a referred to global data object address is 
resolved within 2GB, the 3rip—relative addressing mode can be used instead. See example 


in figure 
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Figure 3.18: Absolute Global Data Load and Store 


static int src; Lsrce: .long 

static int dst; Ldst: .long 

extern int «ptr; -extern ptr 

dst = src; movabs S$Lsrc, %rax R_X86_64_64 
movabs SLdst, %rdx R_X86_64 64 
movl (Srax) , SeCX 
movl S$ecx, (Srdx) 

ptr = &dst; movabs Sptr,%rax R_X86_64_ 64 
movabs S$Ldst, %rdx R_X86_64 64 
movq S$rdax, (Srax) 

*xptr = src; movabs S$Lsrc, %rax R_X86_64_64 
movabs Sptr,%rdx R_X86_64_ 64 
movl (Srax) , SeCX 
movgq (Srdx) , srdx 
movl S$ecx, (Srdx) 


Figure 3.19: Faster Absolute Global Data Load and Store 


movabs 
movl 
movgq 


movl 


Sptr, srdx 


Lsrc(%rip),%ecx 


(Srdx) , srdx 
S$ecx, (Srdx) 


, 


R_X86_64_64 


For position-independent code access to both static and external global data 


assumes that the GOT address is stored in a dedicated register. In these examples 
we assume it is in $r1 SP] (see Function Prologue): 


If, at code generation-time, it is determined that a referred to global data object address is 


resolved within 2GB, the 3rip—relative addressing mode can be used instead. See example 


in figure 
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Figure 3.20: Position-Independent Global Data Load and Store 


static int src; Lsre: .long 

static int dst; Ldst: .long 

extern int «ptr; .extern ptr 

dst = src; movabs SLsrc@GOTOFF, Srax ; R_X86_64_GOTOFF64 
movabs SLdst@GOTOFF,%rdx ; R_X86_64_ GOTOFF64 
movl (Srax, $r15),%eCcx 
movil Secx, (Srdx, Sr15) 

ptr = &dst; movabs Sptr@GOT, Srax ; R_X86_64_ GOT64 
movabs SLdst@GOTOFF, ¢rdx ; R_X86_64_GOTOFF64 
movq (Srax, 6r15),%rax 
leaq (Srdx, 6r15),%rcx 
movq Srcx, (Srax) 

xptr = src; movabs S$Lsrc@GOTOFF, %rax ; R_X86_64_ GOTOFF64 
movabs Sptr@GOT, rdx ; R_X86_64_ GOT64 
movl (Srax, 6r15),%ecx 
movq (Srdx, 6r15),%rdx 
movl Secx, (Srdx) 


Figure 3.21: Faster Position-Independent Global Data Load and Store 


xptr 


Src, 


movabs 
movl 
movg 


movl 


Sptr@GOT, Srdx : 
Lsre(%Srip) ,%ecx 


(3 


Srdx, sr15),%rdx 


Secx, (Srdx) 


R_X86_64_GOT64 
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3.5.5 Function Calls 
Small and Medium Models 


Figure 3.22: Position-Independent Direct Function Call (Small and Medium 


Model) 


extern void function 


(); 


function 


lobl function 
11 function@PLT 


(); 


Ca 


Figure 3.23: Position-Independent Indirect Function Call 


extern void (xptr) (); 
extern void name (); 
ptr = name; 

(xptr) (); 


-gGlobl ptr, name 
movgq ptr@GOTPCREL (%Srip), 
movq name@GOTPCREL(%rip), 
movq Srdx, (%rax) 
movgq ptr@GOTPCREL(%Srip), 


call x* (%rax) 


Srax 
Srdax 


Srax 


Large models 


It cannot be assumed that a function is within 2GB in general. Therefore, it is 
necessary to explicitly calculate the desired address reaching the whole 64-bit 


address space. 
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Figure 3.24: Absolute Direct and Indirect Function Call 


static void (*ptr) (void); | Lptr: .quad 
extern void foo (void); -globl foo 
static void bar (void); Lbar: 
foo (); movabs Sfoo, rl ; R_X86_64 64 
call *Sr11 
bar (); movabs SLbar,%rll ; R_X86_64 64 
call *Sr11 
ptr = foo; movabs SLptr,%rax ; R_X86_64_64 
movabs Sfoo, rl ; R_X86_64 64 
movq S$r11, (Srax) 
ptr = bar; movabs SLbar,%rll ; R_X86_64 64 
movq Srl1l1, (Srax) 
(xptr) (); movabs SLptr,%rll ; R_X86_64_64 
call * ($r11) 
And in the case of position-independent objects 
Figure 3.25: Position-Independent Direct and Indirect Function Call 
static void (*ptr) (void); | Lptr: .quad 
extern void foo (void); -globl foo 
static void bar (void); Lbar: 
foo (); movabs Sfoo@GOT, $r11 ; R_x86_64_ GOTPLT64 
call *($r11,%r15) 
bar (); movabs SLbar@GOTOFF, rll ; R_X86_64 GOTOFF64 
leaq ($r11,%r15),%r1l 
call *Sr11 
ptr = foo; movabs SLptr@GOTOFF, Srax ; R_X86_64 GOTOFF64 
movabs Sfoo@PLTOFF,%r1ll ; R_X86_64 PLTOFF64 
leaq (6211, 3615), ri 
movq $r11, ($rax, $r15) 
ptr = bar; movabs SLbar@GOTOFF, rll ; R_X86_64 GOTOFF64 
leaq (3201 1:35) oe 
movgq $r11, (Srax, $r15) 
(xptr) (); movabs SLptr@GOTOFF, $r11 ; R_X86_64_ GOTOFF64 
call *($r11,%r15) 


3See subsection “Implementation advice” for some optimizations. 
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Implementation advice 


If, at code generation-time, certain conditions are determined, it’s possible to 
generate faster or smaller code sequences as the large model normally requires. 
When: 


(absolute) target of function call is within 2GB , a direct call or $rip-relative 
addressing might be used: 


bar (); call Lbar 

ptr = bar; movabs SLptr, %rax ; R_X86_64_ 64 
leaq SLbar (%rip),%r1l 
movq Sr1l1, (Srax) 


(PIC) the base of GOT is within 2GB an indirect call to the GOT entry might 
be implemented like so: 
foo (); call *(foo@GOT) ; R_X86_64_GOTPCREL 


(PIC) the base of PLT is within 2GB , the PLT entry may be referred to rela- 
tively to Srip: 


ptr = foo; movabs SLptr@GOTOFF, trax ; R_X86_64 GOTOFF64 
leaq Sfoo@PLT (Srip),%rll ; R_X86_64 _PLT32 
movq S$r1l1l, (Srax, $r15) 


(PIC) target of function call is within 2GB and is either not global or bound lo- 
cally, a direct call to the symbol may be used or it may be referred to rela- 
tively to Srip: 


bar (); call Lbar 

ptr = bar; | movabs SLptr@GOTOFF,%rax ; R_X86_64 GOTOFF64 
leag SLbar (Srip),%r11 
movq S$r1l1, (Srax, r15) 


3.5.6 Branching 

Small and Medium Models 

As all labels are within 2GB no special care has to be taken when implementing 
branches. The full AMD64 ISA is usable. 

Large Models 


Because functions can be theoretically up to 16EB long, the maximum 32-bit 
displacement of conditional and unconditional branches in the AMD64 ISA are 
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not enough to address the branch target. Therefore, a branch target address is 
calculated explicitly 4] For absolute objects: 


Figure 3.26: Absolute Branching Code 


if (ta) testl Seax, eax 

{ jnz ie 
movabs S2f,%r11 ; R_X86_64_64 
jmpq *or1l 

teed i: 

} 2: 

goto Label; movabs SLabel,%rll ; R_X86_64_64 
jmpq *Sr11 

Label: Label: 


Figure 3.27: Implicit Calculation of Target Address 


if (ta) testl Seax, Seax 
{ 4z 2f 

is 
} 2 
goto Label; jmp Label 
Label: Label: 


For position-independent objects: 


-4Tf, at code generation-time, it is determined that the target addresses are within 2GB, alterna- 
tively, branch target addresses may be calculated implicitly (see figure 3.27) 
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Figure 3.28: Position-Independent Branching Code 


Label: 


if (ta) testl Seax, teax 

{ jnz Le 
movabs S2£f@GOTOFF, Sr11 ; R_X86_64_GOTOFF64 
leag ($r11,%r15),%r11 
jmpgq *Sr11 

} 

goto Label; movabs SLabel@GOTOFF, Sr11 ; R_X86_64 GOTOFF64 
leag ($r11,%r15),%r11 
jmpgq *or11 
Label: 


For absolute objects, the implementation of the switch statement is: 


Figure 3.29: Absolute Switch Code 


switch 


{ 


(a) 


default: 
case 0: 


case 2: 


.Ltable: 


-al 
qu 
qu 
qu 


cm 
ae 
cm 
jg 
mo 
3m 
-Ss 


.pr 


.Ldefault: 


.Lcase0: 


.Lcoase2: 


pl SO, eax 
.Ldefault 

pL $2,%eax 
.Ldefault 

vabs $.Ltable,%rll ; 

pg x (%r1l1,%eax, 8) 

ection .lrodata,"aLM", 

ign 8 

ad .Lcase0 f 

ad .Ldefault . 

ad .Lcase2 : 

evious 


R_X86_64_64 
@progbits,8 
R_X86_64_64 


R_X86_64_64 
R_X86_64_64 


When building position-independent objects, the switch statement imple- 


mentation changes to: 
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Figure 3.30: Position-Independent Switch Code 


switch (a) cmpl SO, eax 
{ jl .Ldefault 
cmp $2,%eax 
jg .Ldefault 
movabs S.Ltable@GOTOFF,%rll1 ; R_X86_64 GOTOFF64 
leaq (S$r11,%r15),%Sr11 
movq *($r11,%eax,8),Sr11 
leaq (S$r11,%r15),%Sr11 
jmpq *Sr11 
-section .lrodata,"aLM", @progbits, 8 
-align 8 
.Ltable: .quad .Lcase0@GOTOFF ; R_X86_64 GOTOFF64 
.quad .Ldefault@GOTOFF ; R_X86_64 GOTOFF64 
-quad .Lcase2@GOTOFF ; R_X86_64_ GOTOFF64 
-previous 
default: .Ldefault: 
case 0: .Lcase0: 
case 2: .Lcase2: 
} 


25) 


3.5.7 Variable Argument Lists 


Some otherwise portable C programs depend on the argument passing scheme, 
implicitly assuming that all arguments are passed on the stack, and arguments 
appear in increasing order on the stack. Programs that make these assumptions 
never have been portable, but they have worked on many implementations. How- 
ever, they do not work on the AMD64 architecture because some arguments are 
passed in registers. Portable C programs must use the header file <stdarg.h> 
in order to handle variable argument lists. 

When a function taking variable-arguments is called, srax must be set to the 
total number of floating point parameters passed to the function in vector regis- 


>>The jump-table is emitted in a different section so as to occupy cache lines without instruction 
bytes, thus avoiding exclusive cache subsystems to thrash. 
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ters |?4| 


When __m256 is passed as variable-argument, it should always be passed on 
stack. Only named __m256 arguments may be passed in register as specified in 


section 


Figure 3.31: Parameter Passing Example with Variable-Argument List 


int a, b; 

long double ld; 
double m, n; 

_ m256 u, y; 


extern void func (int a, double m, __m256 u, ...); 


func (a, m, u, b, ld, y, n); 


Figure 3.32: Register Allocation Example for Variable-Argument List 


General Purpose Registers Floating Point Registers Stack Frame Offset 


Srdi: a Sxmm0: m O: ld 
Srsi: b Symml: u 32: y 
$rax: 3 Sxmm2: n 

The Register Save Area 


The prologue of a function taking a variable argument list and known to call the 
macro va_start is expected to save the argument registers to the register save 
area. Each argument register has a fixed offset in the register save area as defined 
in the figure [3.33] 

Only registers that might be used to pass arguments need to be saved. Other 
registers are not accessed and can be used for other purposes. If a function is 


6This implies that the only legal values for %rax when calling a function with variable- 
argument lists are 0 to 8 (inclusive). 
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known to never accept arguments passed in registers””| the register save area may 
be omitted entirely. 

The prologue should use %rax to avoid unnecessarily saving XMM registers. 
This is especially important for integer only programs to prevent the initialization 
of the XMM unit. 


Figure 3.33: Register Save Area 


Register Offset 


Srdi 0 
Srsi 8 
Srdx 16 
SYCx 24 
Sr8 32 
Sxr9 40 
%xmm0 48 
&xmm1 64 
Sxmm15 288 


The va_list Type 


The va_list type is an array containing a single element of one structure con- 
taining the necessary information to implement the va_arg macro. The C defi- 
nition of va_list type is given in figure 


°7This fact may be determined either by exploring types used by the va_arg macro, or by the 
fact that the named arguments already are exhausted the argument registers entirely. 
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Figure 3.34: va_list Type Declaration 


typedef struct { 
unsigned int gp_offset; 
unsigned int fp_offset; 
void *xoverflow_arg_area; 
void x*reg_save_area; 

} va_list[1]; 


The va_start Macro 


The va_start macro initializes the structure as follows: 


reg_save_area The element points to the start of the register save area. 


overflow_arg_area This pointer is used to fetch arguments passed on the stack. 
It is initialized with the address of the first argument passed on the stack, if 
any, and then always updated to point to the start of the next argument on 
the stack. 


gp_offset The element holds the offset in bytes from reg_save_area to the 
place where the next available general purpose argument register is saved. 
In case all argument registers have been exhausted, it is set to the value 48 
(6 x 8). 


fp_offset The element holds the offset in bytes from reg_save_area to the 
place where the next available floating point argument register is saved. In 
case all argument registers have been exhausted, it is set to the value 304 
(6* 8+ 16 « 16). 


The va_arg Macro 


The algorithm for the generic va_arg(1, type) implementation is defined as 
follows: 


1. Determine whether t ype may be passed in the registers. If not go to step 
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10. 
Ts 


. Compute num_gp to hold the number of general purpose registers needed 


to pass type and num_fp to hold the number of floating point registers 
needed. 


. Verify whether arguments fit into registers. In the case: 


1->gp_offset > 48 —num_gp «8 


or 
1->fp_offset > 304 — num_fp «16 


go to step 


. Fetcht ype from 1—>reg_save_area with an offset of 1->gp_offset 


and/or 1->fp_offset. This may require copying to a temporary loca- 
tion in case the parameter is passed in different register classes or requires 
an alignment greater than 8 for general purpose registers and 16 for XMM 
registers. 


Set? 
1->gp_offset = 1->gp_offset +num_gp*8 


1->fp_offset = 1->fp_offset +num_fp* 16. 


. Return the fetched type. 


. Align 1->overflow_arg_area upwards to a 16 byte boundary if align- 


ment needed by t ype exceeds 8 byte boundary. 


. Fetch type from 1->overflow_arg_area. 


. Set 1->overflow_arg_area to: 


1->overflow_arg_area+ sizeof(type) 


Align 1->overflow_arg_area upwards to an 8 byte boundary. 


Return the fetched type. 
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The va_arg macro is usually implemented as a compiler builtin and ex- 
panded in simplified forms for each particular type. Figure is a sample 1m- 
plementation of the va_arg macro. 


Figure 3.35: Sample Implementation of va_arg(l, int) 


movl 
cmpl 
jae 
leal 
addq 
movl 
jmp 
stack: movq 
leaq 
movq 
fetch: movl 


1l->gp_offset, steax 

$48, eax 

stack 

$8(3rax), %edx 
1->reg_save_area, Srax 


Sedx, 1->gp_offset 

fetch 

1->overflow_arg_area, trax 
8(Srax), srdx 
Srdx,l->overflow_arg_area 
(Srax), eax 


Is register available? 

If not, use stack 

Next available register 
Address of saved register 
Update gp_offset 


Address of stack slot 
Next available stack slot 
Update 

Load argument 


3.6 DWARF Definition 


This sectior|”*|defines the Debug With Arbitrary Record Format (DWARF) debug- 
ging format for the AMD64 processor family. The AMD64 ABI does not define a 
debug format. However, all systems that do implement DWARF on AMD64 shall 
use the following definitions. 

DWARF is a specification developed for symbolic, source-level debugging. 
The debugging information format does not favor the design of any compiler or 
debugger. For more information on DWARF, see DWARF Debugging Informa- 
tion Format, revision: Version 3, January, 2006, Free Standards Group, DWARF 


Standard Committee. It’s available at: |nttp://www.dwarfstd.org/ 


*8This section is structured in a way similar to the PowerPC psABI 
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3.6.1 DWARF Release Number 


The DWARF definition requires some machine-specific definitions. The register 
number mapping needs to be specified for the AMD64 registers. In addition, the 
DWARF Version 3 specification requires processor-specific address class codes to 
be defined. 


3.6.2 DWARF Register Number Mapping 
Table ?/outlines the register number mapping for the AMD64 processor fam- 


ilyP| 


3.7. Stack Unwind Algorithm 


The stack frames are not self descriptive and where stack unwinding is desirable 
(such as for exception handling) additional unwind information needs to be gen- 
erated. The information is stored in an allocatable section .eh_frame whose 
format is identical to . debug_frame defined by the DWARF debug informa- 
tion standard, see DWARF Debugging Information Format, with the following 
extensions: 


Position independence In order to avoid load time relocations for position inde- 
pendent code, the FDE CIE offset pointer should be stored relative to the 
start of CIE table entry. Frames using this extension of the DWARE stan- 
dard must set the CIE identifier tag to 1. 


Outgoing arguments area delta To maintain the size of the temporarily allo- 
cated outgoing arguments area present on the end of the stack (when us- 
ing push instructions), operation GNU_LARGS_SIZE (0x2e) can be used. 
This operation takes a single uleb128 argument specifying the current 
size. This information is used to adjust the stack frame when jumping into 
the exception handler of the function after unwinding the stack frame. Ad- 
ditionally the CIE Augmentation shall contain an exact specification of the 
encoding used. It is recommended to use a PC relative encoding whenever 
possible and adjust the size according to the code model used. 


°The table defines Return Address to have a register number, even though the address is stored 
in 0(%rsp) and not in a physical register. 
3°This document does not define mappings for privileged registers. 
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Figure 3.36: DWARF Register Number Mapping 


Register Name Number Abbreviation 
General Purpose Register RAX 0 | Srax 
General Purpose Register RDX 1 | $rdx 
General Purpose Register RCX 2. | SEOK 
General Purpose Register RBX 3 | Srbx 
General Purpose Register RSI 4 | Srsi 
General Purpose Register RDI >. | sds 
Frame Pointer Register RBP 6 | Srbp 
Stack Pointer Register RSP 7 | Srsp 
Extended Integer Registers 8-15 8-15 | Sr8-Sr15 
Return Address RA 16 
Vector Registers 0—7 17-24 | $xmm0-%xmm7 
Extended Vector Registers 8—15 25-32 | $xmm8-—%xmm15 
Floating Point Registers 0-7 33-40 | 3st0-%st7 
MMxX Registers 0-7 41-48 | SmmO0-—%mm7 
Flag Register 49 | SrF LAGS 
Segment Register ES 50 | Ses 
Segment Register CS SL.) 26s 
Segment Register SS 52: Ss 
Segment Register DS 53 | Sds 
Segment Register FS 54 | Sfs 
Segment Register GS 55-| 2gs 
Reserved 56-57 
FS Base address 58 | sfs.base 
GS Base address 59 | %gs.base 
Reserved 60-61 
Task Register 62 | Str 
LDT Register 63 | Sldtr 
128-bit Media Control and Status 64 | Smxcsr 
x87 Control Word 65 | Sfcw 
x87 Status Word 66 | Sfsw 
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Figure 3.37: Pointer Encoding Specification Byte 


Mask Meaning 
Ox1 | Values are stored as uleb128 or sleb128 type (according to flag 0x8) 
0x2 | Values are stored as 2 bytes wide integers (udat a2 or sdata2) 
0x3 | Values are stored as 4 bytes wide integers (udat a4 or sdata4) 
0x4 | Values are stored as 8 bytes wide integers (udat a8 or sdata8) 
Ox8 | Values are signed 
0x10 | Values are PC relative 
0x20 | Values are text section relative 
0x30 | Values are data section relative 
0x40 | Values are relative to the start of function 


CIE Augmentations: The augmentation field is formated according to the aug- 
mentation field formating string stored in the CIE header. 


The string may contain the following characters: 


z Indicates that a uleb128 is present determining the size of the augmen- 


tation section. 


L Indicates the encoding (and thus presence) of an LSDA pointer in the 


FDE augmentation. 

The data filed consist of single byte specifying the way pointers are 
encoded. It is a mask of the values specified by the table B.37| 

The default DWARF3 pointer encoding (direct 4-byte absolute point- 
ers) is represented by value 0. 


R Indicates a non-default pointer encoding for FDE code pointers. The 


formating is represented by a single byte in the same way as in the ‘L’ 
command. 


P Indicates the presence and an encoding of a language personality routine 


in the CIE augmentation. The encoding is represented by a single byte 
in the same way as in the ’L’ command followed by a pointer to the 
personality function encoded by the specified encoding. 


When the augmentation is present, the first command must always be ‘z’ to 
allow easy skipping of the information. 
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In order to simplify manipulation of the unwind tables, the runtime library 
provide higher level API to stack unwinding mechanism, for details see section 
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Chapter 4 
Object Files 


4.1 ELF Header 


4.1.1 Machine Information 


For file identification in e_ident, the AMD64 architecture requires the follow- 
ing values. 


Table 4.1: AMD64 Identification 
Position Value 


e_ident [EI_CLASS] | ELFCLASS64 
e_ident [EI_DATA] ELFDATA2LSB 


Processor identification resides in the ELF headers e_ machine member and 
must have the value EM_X86_64 


4.1.2 Number of Program Headers 


The e_phnum member contains the number of entries in the program header 
table. The product of e_phentsize and e_phnum gives the table’s size in 
bytes. If a file has no program header table, e_phnum holds the value zero. 


'The value of this identifier is 62. 
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If the number of program headers is greater than or equal to PN_XNUM (Oxffff), 
this member has the value PN_XNUM (Oxffff). The actual number of program 
header table entries is contained in the sh_info field of the section header at 
index 0. Otherwise, the sh_info member of the initial entry contains the value 
zero. 


4.2 Sections 


4.2.1 Section Flags 


In order to allow linking object files of different code models, it is necessary to 
provide for a way to differentiate those sections which may hold more than 2GB 
from those which may not. This is accomplished by defining a processor-specific 
section attribute flag for sh_flag (see table[4.2). 


Table 4.2: AMD64 Specific Section Header Flag, sh_flags 


Name Value 
SHF_X86_64_LARGE | 0x10000000 


SHF_X86_64 LARGE If an object file section does not have this flag set, then 
it may not hold more than 2GB and can be freely referred to in objects using 
smaller code models. Otherwise, only objects using larger code models can 
refer to them. For example, a medium code model object can refer to data 
in a section that sets this flag besides being able to refer to data in a section 
that does not set it; likewise, a small code model object can refer only to 
code in a section that does not set this flag. 
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4.2.2 Section types 


Table 4.3: Section Header Types 


sh_type name Value 
SHT_X86_64_UNWIND | 0x70000001 


SHT_X86_64_UNWIND This section contains unwind function table entries for 
stack unwinding. The contents are described in Section of this docu- 
ment. 


4.2.3 Special Sections 


Table 4.4: Special sections 


Name Type Attributes 
-got SHT_PROGBITS SHF_ALLOC+SHF_WRITE 
-plt SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR 


.eh_frame | SHT_X86_64 UNWIND | SHF_ALLOC 


.got This section holds the global offset table. 
.plt This section holds the procedure linkage table. 


.eh_frame This section holds the unwind function table. The contents are de- 
scribed in Section of this document. 


The additional sections defined in table are used by a system supporting 
the large code model. 
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Table 4.5: Additional Special Sections for the Large Code Model 


Name Type Attributes 

.lbss SHT_NOBITS SHF_ALLOC+SHF_WRITE+SHF_X86_64_ LARGE 
.ldata SHT_PROGBITS | SHF_ALLOC+SHF_WRITE+SHF_X86_64 LARGE 
.ldatal SHT_PROGBITS | SHF_ALLOC+SHF_WRITE+SHF_X86_64 LARGE 
-lgot SHT_PROGBITS | SHF_ALLOC+SHF_WRITE+SHF_X86_64 LARGE 
ALE SHT_PROGBITS | SHF_ALLOC+SHF_EXECINSTR+SHF_X86_64_ LARGE 
.lrodata SHT_PROGBITS | SHF_ALLOC+SHF_X86_64_ LARGE 

.lrodatal | SHT_PROGBITS | SHF_ALLOC+SHF_X86_64_ LARGE 

-ltext SHT_PROGBITS | SHF_ALLOC+SHF_EXECINSTR+SHF_X86_64 LARGE 


In order to enable static linking of objects using different code models, the 
following section ordering is suggested: 


-plt .init .fini .text .got .rodata .rodatal .data .datal .bss 
These sections can have a combined size of up to 2GB. 


-lplt .ltext .lgot .lrodata .lrodatal .ldata .ldatal .1lbss 
These sections plus the above can have a combined size of up to 16EB. 


4.2.4 EH FRAME sections 


The call frame information needed for unwinding the stack is output into one or 
more ELF sections of type SHT_X86_64 UNWIND. In the simplest case there 
will be one such section per object file and it will be named .eh_frame. An 
.eh_frame section consists of one or more subsections. Each subsection con- 
tains a CIE (Common Information Entry) followed by varying number of FDEs 
(Frame Descriptor Entry). A FDE corresponds to an explicit or compiler gener- 
ated function in a compilation unit, all FDEs can access the CIE that begins their 
subsection for data. If the code for a function is not one contiguous block, there 
will be a separate FDE for each contiguous sub-piece. 

If an object file contains C++ template instantiations there shall be a separate 
CIE immediately preceding each FDE corresponding to an instantiation. 
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Using the preferred encoding specified below, the .eh_frame section can be 
entirely resolved at link time and thus can become part of the text segment. 

EH_PE encoding below refers to the pointer encoding as specified in the en- 
hanced LSB Chapter 7 for En_Frame_Hdr. 
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Table 4.6: Common Information Entry (CIE) 


Field Length (byte) Description 

Length 4 Length of the CIE (not including this 4- 
byte field) 

CIE id 4 Value 0 for .eh_frame (used to distin- 
guish CIEs and FDEs when scanning the 
section) 

Version 1 Value One (1) 

CIE Augmenta- | string Null-terminated string with legal values 

tion String being "" or ’z’ optionally followed by sin- 
gle occurrances of ’P’, ’L’, or ’R’ in any 
order. The presence of character(s) in the 
string dictates the content of field 8, the 
Augmentation Section. Each character has 
one or two associated operands in the AS 
(see table for which ones). Operand 
order depends on position in the string (z’ 
must be first). 

Code Align Fac- | uleb128 To be multiplied with the "Advance Lo- 

tor cation" instructions in the Call Frame In- 
structions 

Data Align Fac- | sleb128 To be multiplied with all offsets in the Call 

tor Frame Instructions 

Ret Address Reg | 1/uleb128 A "virtual" register representation of the 
return address. In Dwarf V2, this is a byte, 
otherwise it is uleb128. It is a byte in gcc 
3.3.x 

Optional CIE | varying Present if Augmentation String in Aug- 

Augmentation mentation Section field 4 is not 0. See table 

Section [4.7] for the content. 

Optional Call | varying 

Frame _ Instruc- 

tions 
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Table 4.7: CIE Augmentation Section Content 


Char Operands Length (byte) Description 

Z size uleb128 Length of the remainder of the Augmen- 
tation Section 

P personality_enc] 1 Encoding specifier - preferred value is a 
pe-relative, signed 4-byte 

personality (encoded) Encoded pointer to personality routine 
routine (actually to the PLT entry for the per- 
sonality routine) 

R code_enc 1 Non-default encoding for the 
code-pointers (FDE members 
initial_location and 
address_range and the operand for 
DW_CFA_set_loc) - preferred value 
is pc-relative, signed 4-byte 

L Isda_enc 1 FDE augmentation bodies may contain 


LSDA pointers. If so they are encoded 
as specified here - preferred value is pc- 
relative, signed 4-byte possibly indirect 
thru a GOT entry 
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Field 


Table 4.8: Frame Descriptor Entry (FDE) 


Length (byte) 


Description 


Length 


CIE pointer 


Initial Location 


Address Range 


Optional FDE 
Augmentation 
Section 
Optional 
Frame 
tions 


Call 
Instruc- 


4 


4 


var 


var 


var 


var 


Length of the FDE (not including this 4- 
byte field) 

Distance from this field to the nearest pre- 
ceding CIE (the value is subtracted from 
the current address). This value can never 
be zero and thus can be used to distin- 
guish CIE’s and FDE’s when scanning the 
.eh_frame section 

Reference to the function code correspond- 
ing to this FDE. If ’R’ is missing from 
the CIE Augmentation String, the field is 
an 8-byte absolute pointer. Otherwise, the 
corresponding EH_PE encoding in the CIE 
Augmentation Section is used to interpret 
the reference 

Size of the function code corresponding to 
this FDE. If ’R’ is missing from the CIE 
Augmentation String, the field is an 8-byte 
unsigned number. Otherwise, the size is 
determined by the corresponding EH_PE 
encoding in the CIE Augmentation Section 
(the value is always absolute) 

Present if CIE Augmentation String is non- 
empty. See table|4.9|for the content. 
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Table 4.9: FDE Augmentation Section Content 


Char Operands Length (byte) Description 

Z length uleb128 Length of the remainder of the Augmen- 
tation Section 

L LSDA var LSDA pointer, encoded in the format 


specified by the corresponding operand 
in the CIE’s augmentation body. (only 
present if length > 0). 


The existence and size of the optional call frame instruction area must be com- 
puted based on the overall size and the offset reached while scanning the preceding 
fields of the CIE or FDE. 

The overall size of a .eh_frame section is given in the ELF section header. 
The only way to determine the number of entries is to scan the section until the 
end, counting entries as they are encountered. 


4.3. Symbol Table 


The discussion of "Function Addresses" in Section|5.2|defines some special values 
for symbol table fields. 

The STT_GNU_IFUNC Plsymbol type is optional. It is the same as STT_FUNC 
except that it always points to a function or piece of executable code which takes 
no arguments and returns a function pointer. If an STT_GNU_IFUNC symbol 
is referred to by a relocation, then evaluation of that relocation is delayed until 
load-time. The value used in the relocation is the function pointer returned by an 
invocation of the STT_GNU_IFUNC symbol. 

The purpose of the STT_GNU_IFUNC symbol type is to allow the run-time to 
select between multiple versions of the implementation of a specific function. The 
selection made in general will take the currently available hardware into account 
and select the most appropriate version. 


"It is specified in ifunc.txt at http://sites.google.com/site/x32abi/ 
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4.4 Relocation 


4.4.1 Relocation Types 
Figure shows the allowed relocatable fields. 


Figure 4.1: Relocatable Fields 


7 words 0) 


15 wordl6 0 


31 word32 0) 

63 word64 0 
words This specifies a 8-bit field occupying | byte. 
word1l6 This specifies a 16-bit field occupying 2 bytes with arbitrary 


byte alignment. These values use the same byte order as 
other word values in the AMD64 architecture. 

word32 This specifies a 32-bit field occupying 4 bytes with arbitrary 
byte alignment. These values use the same byte order as 
other word values in the AMD64 architecture. 

word64 This specifies a 64-bit field occupying 8 bytes with arbitrary 
byte alignment. These values use the same byte order as 
other word values in the AMD64 architecture. 

The following notations are used for specifying relocations in table 4.10} 


A Represents the addend used to compute the value of the relocatable field. 
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B Represents the base address at which a shared object has been loaded into mem- 
ory during execution. Generally, a shared object is built with a 0 base virtual 
address, but the execution address will be different. 


G Represents the offset into the global offset table at which the relocation entry’s 
symbol will reside during execution. 


GOT Represents the address of the global offset table. 


L Represents the place (section offset or address) of the Procedure Linkage Table 
entry for a symbol. 


P Represents the place (section offset or address) of the storage unit being relo- 
cated (computed using r_offset). 


S Represents the value of the symbol whose index resides in the relocation entry. 
Z Represents the size of the symbol whose index resides in the relocation entry. 


The AMD64 ABI architectures uses only E1£64_ Rela relocation entries 
with explicit addends. The r_addend member serves as the relocation addend. 


70 


AMD64 ABI Draft 0.99.6 — October 7, 2013 — 10:35 


Table 4.10: Relocation Types 


Name Value Field Calculation 
R_X86_64 NONE O | none none 
R_X86_64_64 1 | word64 S +A 
R_X86_64_PC32 2 | word32 S+A-P 
R_X86_64_GOT32 3 | word32 GtaA 
R_X86_64 PLT32 4 | word32 L+A- P 
R_X86_64_ COPY 5 | none none 
R_X86_64 _GLOB_DAT 6 | word64 S 
R_X86_64 JUMP_SLOT 7 | word64 S 
R_X86_64 RELATIVE 8 | word64 B+aA 
R_X86_64 GOTPCREL 9 | word32 G + GOT +A 
R_X86_64_ 32 10 | word32 S +A 
R_X86_64_ 32S 11 | word32 S +A 
R_X86_64_16 12 | word16 S +A 
R_X86_64 PC16 13 | word16 S+A- P 
R_X86_64_8 14 | wordS S +A 
R_X86_64 PC8 15 | word& S+A-P 
R_X86_64_DTPMOD64 16 | word64 
R_X86_64 DTPOFF64 17 | word64 
R_X86_64_ TPOFF64 18 | word64 
R_X86_64 TLSGD 19 | word32 
R_X86_64_ TLSLD 20 | word32 
R_X86_64 DTPOFF32 21 | word32 
R_X86_64 GOTTPOFF 22 | word32 
R_X86_64_ TPOFF32 23 | word32 
R_X86_64 PC64 24 | word64 S+A- P 
R_X86_64 GOTOFF64 25 | word64 S + A - GOT 
R_X86_64_ GOTPC32 26 | word32 GOT + A - P 
R_X86_64 SIZE32 32 | word32 Z+A 
R_X86_64 SIZK64 33 | word64 Z+A 
R_X86_64_GOTPC32_TLSDESC 34 | word32 
R_X86_64_TLSDESC_CALL 35 | none 
R_X86_64 TLSDESC 36 | word64 x2 
R_X86_64_IRELATIVE 37 | word64 indirect (B + A) 
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The special semantics for most of these relocation types are identical to those 
used for the Intel386 ABI. 

The R_X86_64 GOTPCREL relocation has different semantics from the 
R_X86_64_GOT32 or equivalent 1386 R_I386_GOTPC relocation. In partic- 
ular, because the AMD64 architecture has an addressing mode relative to the in- 
struction pointer, it is possible to load an address from the GOT using a single in- 
struction. The calculation done by the R_X86_64_GOTPCREL relocation gives 
the difference between the location in the GOT where the symbol’s address is 
given and the location where the relocation is applied. 

The R_X86_64_32 and R_X86_64_32S relocations truncate the com- 
puted value to 32-bits. The linker must verify that the generated value for the 
R_X86_64_32 (R_X86_64_ 32S) relocation zero-extends (sign-extends) to the 
original 64-bit value. 

A program or object file using R_X86_64 8, R_X86_64 16, 
R_X86_64_PC16 or R_X86_64_PC8 relocations is not conformant to 
this ABI, these relocations are only added for documentation purposes. The 
R_X86_64_16, and R_X86_64_8 relocations truncate the computed value to 
16-bits resp. 8-bits. 

The’ relocations R_X86_64 DTPMOD64, R_X86_64 DTPOFF64, 
R_X86_64_TPOFF64, R_X86_64_TLSGD, R_X86_64_TLSLD, 
R_X86_64_DTPOFF32, R_X86_64_GOTTPOFF and R_X86_64_TPOFF32 
are listed for completeness. They are part of the Thread-Local Storage ABI 
extensions and are documented in the document called “ELF Handling for 
Thread-Local Storage’ [| The relocations R_X86_64_GOTPC32_TLSDESC, 
R_X86_64_TLSDESC_CALL and R_X86_64 _TLSDESC are also used for 
Thread-Local Storage, but are not documented there as of this writing. A 
description can be found in the document “Thread-Local Storage Descriptors for 


Even though the AMD64 architecture supports IP-relative addressing modes, a GOT is still 
required since the offset from a particular instruction to a particular data item cannot be known by 
the static linker. 

4Note that the AMD64 architecture assumes that offsets into GOT are 32-bit values, not 64-bit 
values. This choice means that a maximum of 2°7/8 = 27° entries can be placed in the GOT. 
However, that should be more than enough for most programs. In the event that it is not enough, 
the linker could create multiple GOTs. Because 32-bit offsets are used, loads of global data do 
not require loading the offset into a displacement register; the base plus immediate displacement 
addressing form can be used. 


>This document is currently available via http: //people. redhat .com/drepper/ 
tls...pdast 
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1A32 and AMD64/EM64T {| 

In order to make this document self-contained, a description of the TLS relo- 
cations follows. 

R_X86_64_DTPMOD64 resolves to the index of the dynamic thread vec- 
tor entry that points to the base address of the TLS block corresponding to 
the module that defines the referenced symbol. R_X86_64 DTPOFF64 and 
R_X86_64_DTPOFF32 compute the offset from the pointer in that entry to 
the referenced symbol. The linker generates such relocations in adjacent en- 
tries in the GOT, in response to R_X86_64_TLSGD and R_X86_64_TLSLD 
relocations. If the linker can compute the offset itself, because the referenced 
symbol binds locally, the relocations R_X86_64_64 and R_X86_64_32 may 
be used instead. Otherwise, such relocations are always in pairs, such that the 
R_X86_64_DTPOFF64 relocation applies to the word64 right past the corre- 
sponding R_X86_64_DTPMOD64 relocation. 

R_X86_64_TPOFF64 and R_X86_64_TPOFF32 resolve to the offset from 
the thread pointer to a thread-local variable. The former is generated in response 
to R_X86_64_GOTTPOFF, that resolves to a PC-relative address of a GOT entry 
containing such a 64-bit offset. 

R_X86_64 _TLSGD andR_X86_64_TLSLD both resolve to PC-relative off- 


sets toa DTPMOD GOT entry. The difference between them is that, for R_X86_64_TLSGI 


the following GOT entry will contain the offset of the referenced symbol into its 
TLS block, whereas, for R_X8 6_64_TLSLD, the following GOT entry will con- 
tain the offset for the base address of the TLS block. The idea is that adding this 
offset to the result of R_X86_64 DTPMOD32 for a symbol ought to yield the 
same as the result of R_X86_64_DTPMOD64 for the same symbol. 
R_X86_64_TLSDESC resolves to a pair of word64s, called TLS Descriptor, 
the first of which is a pointer to a function, followed by an argument. The function 
is passed a pointer to the this pair of entries in %rax and, using the argument in 
the second entry, it must compute and return in %rax the offset from the thread 
pointer to the symbol referenced in the relocation, without modifying any reg- 
isters other than processor flags. R_X86_64 GOTPC32_TLSDESC resolves to 
the PC-relative address of a TLS descriptor corresponding to the named symbol. 
R_X86_64 _TLSDESC_CALL must annotate the instruction used to call the TLS 
Descriptor resolver function, so as to enable relaxation of that instruction. 
R_X86_64_IRELATIVE is similar to R_X86_64_ RELATIVE except that 


This document is currently available via http: //people.redhat.com/aoliva/ 
writeups/TLS/RFC-TLSDESC-x86.txt 
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the value used in this relocation is the program address returned by the func- 
tion, which takes no arguments, at the address of the result of the corresponding 
R_X86_64 RELATIVE relocation. 

One use of the R_X86_64_ IRELATIVE relocation is to avoid name lookup 
for the locally defined STT_GNU_IFUNC symbols at load-time. Support for this 
relocation is optional, but is required for the STT_GNU_IFUNC symbols. 


4.4.2 Large Models 


In order to extend both the PLT and the GOT beyond 2GB, it is necessary to add 
appropriate relocation types to handle full 64-bit addressing. See figure 


Table 4.11: Large Model Relocation Types 


Name Value | Field Calculation 
R_X86_64_ GOT64 27 word64 |G +A 
R_X86_64_ GOTPCREL64 | 28 word64 |G + GOT - P+A 
R_X86_64_GOTPC64 29 word64 | GOT - P + A 
R_X86_64_GOTPLT64 30 word64 |G + A 
R_X86_64 PLTOFF64 31 word64 | L - GOT +A 
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Chapter 5 


Program Loading and Dynamic 
Linking 


5.1 Program Loading 


Program loading is a process of mapping file segments to virtual memory seg- 
ments. For efficient mapping executable and shared object files must have seg- 
ments whose file offsets and virtual addresses are congruent modulo the page 
Size. 

To save space the file page holding the last page of the text segment may 
also contain the first page of the data segment. The last data page may contain file 
information not relevant to the running process. Logically, the system enforces the 
memory permissions as if each segment were complete and separate; segments’ 
addresses are adjusted to ensure each logical page in the address space has a single 
set of permissions. In the example above, the region of the file holding the end 
of text and the beginning of data will be mapped twice: at one virtual address for 
text and at a different virtual address for data. 

The end of the data segment requires special handling for uninitialized data, 
which the system defines to begin with zero values. Thus if a file’s last data page 
includes information not in the logical memory page, the extraneous data must be 
set to zero, not the unknown contents of the executable file. “Impurities” in the 
other three pages are not logically part of the process image; whether the system 
expunges them is unspecified. 

One aspect of segment loading differs between executable files and shared 
objects. Executable file segments typically contain absolute code (see section[3.5| 
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“Coding Examples”). For the process to execute correctly, the segments must 
reside at the virtual addresses used to build the executable file. Thus the system 
uses the p_vaddr values unchanged as virtual addresses. 

On the other hand, shared object segments typically contain position-independent 
code. This lets a segments virtual address change from one process to another, 
without invalidating execution behavior. Though the system chooses virtual ad- 
dresses for individual processes, it maintains the segments’ relative positions. Be- 
cause position-independent code uses relative addressing between segments, the 
difference between virtual addresses in memory must match the difference be- 
tween virtual addresses in the file. 


5.1.1 Program header 
The following AMD64 program header types are defined: 


Table 5.1: Program Header Types 


Name Value 
PT_GNU_EH FRAME 0x6474e550 
PT_SUNW_EH_ FRAME | 0x6474e550 
PT_SUNW_UNWIND 0x6464e550 


PT_GNU_EH_FRAME, PT_SUNW_EH_FRAME and PT_SUNW_UNWIND 
The segment contains the stack unwind tables. See Section/4.2.4/of this doc- 
ument. 


5.2. Dynamic Linking 


Dynamic Section 


Dynamic section entries give information to the dynamic linker. Some of this 
information is processor-specific, including the interpretation of some entries in 
the dynamic structure. 


' The value for these program headers have been placed in the PT_LOOS and PT_HIOS (os 
specific range) in order to adapt to the existing GNU implementation. New OS’s wanting to agree 
on these program header should also add it into their OS specific range. 
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Global Offset Table (GOT) 


Position-independent code cannot, in general, contain absolute virtual addresses. 
Global offset tables hold absolute addresses in private data, thus making the ad- 
dresses available without compromising the position-independence and shareabil- 
ity of a program’s text. A program references its global offset table using position- 
independent addressing and extracts absolute values, thus redirecting position- 
independent references to absolute locations. 

If a program requires direct access to the absolute address of a symbol, that 
symbol will have a global offset table entry. Because the executable file and shared 
objects have separate global offset tables, a symbol’s address may appear in sev- 
eral tables. The dynamic linker processes all the global offset table relocations 
before giving control to any code in the process image, thus ensuring the absolute 
addresses are available during execution. 

The tables first entry (number zero) is reserved to hold the address of the dy- 
namic structure, referenced with the symbol _DYNAMIC. This allows a program, 
such as the dynamic linker, to find its own dynamic structure without having yet 
processed its relocation entries. This is especially important for the dynamic 
linker, because it must initialize itself without relying on other programs to re- 
locate its memory image. On the AMD64 architecture, entries one and two in the 
global offset table also are reserved. 

The global offset table contains 64-bit addresses. 

For the large models the GOT is allowed to be up to 16EB in size. 


Figure 5.1: Global Offset Table 


extern E1f£64_ Addr _GLOBAL OFFSET TABLE []; 


The symbol _GLOBAL_OFFSET_TABLE_ may reside in the middle of the 
.got section, allowing both negative and non-negative offsets into the array of 
addresses. 

Function Addresses 

References to the address of a function from an executable file and the shared 

objects associated with it might not resolve to the same value. References from 
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within shared objects will normally be resolved by the dynamic linker to the vir- 
tual address of the function itself. References from within the executable file to 
a function defined in a shared object will normally be resolved by the link editor 
to the address of the procedure linkage table entry for that function within the 
executable file. 

To allow comparisons of function addresses to work as expected, if an exe- 
cutable file references a function defined in a shared object, the link editor will 
place the address of the procedure linkage table entry for that function in its as- 
sociated symbol table entry. This will result in symbol table entries with section 
index of SHN_UNDEF but a type of STT_FUNC and a non-zero st_value. A 
reference to the address of a function from within a shared library will be satisfied 
by such a definition in the executable. 

Some relocations are associated with procedure linkage table entries. These 
entries are used for direct function calls rather than for references to function 
addresses. These relocations do not use the special symbol value described above. 
Otherwise a very tight endless loop would be created. 


Procedure Linkage Table 


Much as the global offset table redirects position-independent address calculations 
to absolute locations, the procedure linkage table redirects position-independent 
function calls to absolute locations. The link editor cannot resolve execution trans- 
fers (such as function calls) from one executable or shared object to another. Con- 
sequently, the link editor arranges to have the program transfer control to entries 
in the procedure linkage table. On the AMD64 architecture, procedure linkage ta- 
bles reside in shared text, but they use addresses in the private global offset table. 
The dynamic linker determines the destinations’ absolute addresses and modifies 
the global offset table’s memory image accordingly. The dynamic linker thus can 
redirect the entries without compromising the position-independence and share- 
ability of the program’s text. Executable files and shared object files have separate 
procedure linkage tables. Unlike Intel386 ABI, this ABI uses the same procedure 
linkage table for both programs and shared objects (see figure 5.2). 
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Figure 5.2: Procedure Linkage Table (small and medium models) 


.PLTO: pushq GOT+8 (Srip) GOT [1] 
jmp *GOT+16 (Srip) # GOT[2] 
nop 
nop 
nop 
nop 
-PLT1: jmp xname1l@GOTPCREL (%rip) 16 bytes from .PLI1 
pushq Sindexl 
jmp -PLTO 
-PLT2: jmp xname2@GOTPCREL (%rip) 16 bytes from .PLI1 
pushq Sindex2 
jmp -PLTO 
PLT3 


Following the steps below, the dynamic linker and the program “cooperate” 
to resolve symbolic references through the procedure linkage table and the global 
offset table. 


1. When first creating the memory image of the program, the dynamic linker 
sets the second and the third entries in the global offset table to special 
values. Steps below explain more about these values. 


2. Each shared object file in the process image has its own procedure linkage 
table, and control transfers to a procedure linkage table entry only from 
within the same object file. 


3. For illustration, assume the program calls name1, which transfers control 
to the label .PLT1. 


4. The first instruction jumps to the address in the global offset table entry for 
namel. Initially the global offset table holds the address of the following 
pushgq instruction, not the real address of namel1. 


5. Now the program pushes a relocation index (index) on the stack. The reloca- 
tion index is a 32-bit, non-negative index into the relocation table addressed 
by the DT_JMPREL dynamic section entry. The designated relocation en- 
try will have type R_X86_64_JUMP_SLOT, and its offset will specify the 
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global offset table entry used in the previous jmp instruction. The reloca- 
tion entry contains a symbol table index that will reference the appropriate 
symbol, name1 in the example. 


6. After pushing the relocation index, the program then jumps to . PLTO, the 
first entry in the procedure linkage table. The pushq instruction places the 
value of the second global offset table entry (GOT+8) on the stack, thus giv- 
ing the dynamic linker one word of identifying information. The program 
then jumps to the address in the third global offset table entry (GOT+16), 
which transfers control to the dynamic linker. 


7. When the dynamic linker receives control, it unwinds the stack, looks at 
the designated relocation entry, finds the symbol’s value, stores the “real” 
address for name1 in its global offset table entry, and transfers control to 
the desired destination. 


8. Subsequent executions of the procedure linkage table entry will transfer 
directly to name1, without calling the dynamic linker a second time. That 
is, the jmp instruction at .PLT1 will transfer to name1, instead of “falling 
through” to the pushgq instruction. 


The LD_BIND_NOW environment variable can change the dynamic linking 
behavior. If its value is non-null, the dynamic linker evaluates procedure linkage 
table entries before transferring control to the program. That is, the dynamic linker 
processes relocation entries of type R_X86_64_JUMP_SLOT during process 
initialization. Otherwise, the dynamic linker evaluates procedure linkage table 
entries lazily, delaying symbol resolution and relocation until the first execution 
of a table entry. 

Relocation entries of type R_X86_64_TLSDESC may also be subject to lazy 
relocation, using a single entry in the procedure linkage table and in the global 
offset table, at locations given by DT_TLSDESC_PLT and DT_TLSDESC_GOT, 
respectively, as described in “Thread-Local Storage Descriptors for IA32 and 
AMD64/EM64T P| 

For self-containment, DT_TLSDESC_GOT specifies a GOT entry in which the 
dynamic loader should store the address of its internal TLS Descriptor resolver 
function, whereas DT_TLSDESC_PLT specifies the address of a PLT entry to be 


?This document is currently available via http: //people.redhat.com/aoliva/ 
writeups/TLS/RFC-TLSDESC-x86.txt 
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used as the TLS descriptor resolver function for lazy resolution from within this 
module. The PLT entry must push the linkmap of the module onto the stack and 
tail-call the internal TLS Descriptor resolver function. 


Large Models 


In the small and medium code models the size of both the PLT and the GOT is 
limited by the maximum 32-bit displacement size. Consequently, the base of the 
PLT and the top of the GOT can be at most 2GB apart. 

Therefore, in order to support the available addressing space of 16EB, it is nec- 
essary to extend both the PLT and the GOT. Moreover, the PLT needs to support 
the GOT being over 2GB away and the GOT can be over 2GB in sizef)| 

The PLT is extended as shown in figure[5.3|with the assumption that the GOT 
address is in $r1 


31f it is determined that the base of the PLT is within 2GB of the top of the GOT, it is also 
allowed to use the same PLT layout for a large code model object as that of the small and medium 
code models. 

4See Function Prologue. 
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Figure 5.3: Final Large Code Model PLT 


-PLTO: pushq 8(%r15) GOT [1] 
jmpq *16(%r15) GOT [2] 
rep 
rep 
rep 
nop 
rep 
rep 
rep 
nop 
-PLT1: movabs Snamel@GOT, %r11 16 bytes from .PLTO 
jmp *($r11,%r15) 
-PLTla: pushq Sindexl "call" dynamic linker 
jmp .PLTO 
“PETZ vee 21 bytes from .PLT1 
-PLTx: movabs Snamex@GOT, %r11 102261125th entry 
jmp *($r11,%r15) 
PLTxa: pushgq Sindexx 
pushgq 8 (%r15) repeat .PLTO code 
jmpq *16(%r15) 
-PLTy: ... 27 bytes from .PLTx 


This way, for the first 102261125 entries, each PLT entry besides . PLTO uses 
only 21 bytes. Afterwards, the PLT entry code changes by repeating that of .PLTO, 
when each PLT entry is 27 bytes long. Notice that any alignment consideration is 
dropped in order to keep the PLT size down. 

Each extended PLT entry is thus 5 to 11 bytes larger than the small and 
medium code model PLT entries. 

The functionality of entry .PLTO remains unchanged from the small and medium 
code models. 

Note that the symbol index is still limited to 32 bits, which would allow for up 
to 4G global and external functions. 

Typically, UNIX compilers support two types of PLT, generally through the 
options -fpic and —£PIC. When building position-independent objects using 
the large code model, only -£PIC is allowed. Using the option —fpic with the 
large code model remains reserved for future use. 
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5.2.1 Program Interpreter 


There is one valid program interpreter for programs conforming to the AMD64 
ABI: 


/lib/1d64.so.1 
However, Linux puts this in 


/1i1b64/1d-linux-x86-64.s0.2 


5.2.2 Initialization and Termination Functions 


The implementation is responsible for executing the initialization functions spec- 
ified by DT_INIT, DT_INIT_ARRAY, and DT_PREINIT_ARRAY entries in 
the executable file and shared object files for a process, and the termination (or 
finalization) functions specified by DT_FINI and DT_FINI_ARRAY, as speci- 
fied by the System V ABI. The user program plays no further part in executing the 
initialization and termination functions specified by these dynamic tags. 
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Chapter 6 


Libraries 


A further review of the Intel386 ABI is needed. 


6.1 C Library 


6.1.1 Global Data Symbols 


The symbols _fp_hw, _ f 1t__rounds and __huge_val are not provided by 
the AMD64 ABI. 


6.1.2 Floating Point Environment Functions 


ISO C 99 defines the floating point environment functions from <fenv.h>. 
Since AMD64 has two floating point units with separate control words, the pro- 
gramming environment has to keep the control values in sync. On the other hand 
this means that routines accessing the control words only need to access one unit, 
and the SSE unit is the unit that should be accessed in these cases. The function 
fegetround therefore only needs to report the rounding value of the SSE unit 
and can ignore the x87 unit. 
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6.2 Unwind Library Interface 


This section defines the Unwind Library interfacd!| expected to be provided by 
any AMD64 psABI-compliant system. This is the interface on which the C++ 
ABI exception-handling facilities are built. We assume as a basis the Call Frame 
Information tables described in the DWARF Debugging Information Format doc- 
ument. 

This section is meant to specify a language-independent interface that can be 
used to provide higher level exception-handling facilities such as those defined by 
C++. 

The unwind library interface consists of at least the following routines: 
wind_RaiseException, 
wind_Resume , 
wind_DeleteException, 
wind_GetGR, 
wind_SetGR, 
wind_GetIP, 
wind_SetIP, 
wind_GetRegionStart, 
wind_GetLanguageSpecificData, 


wind_ForcedUnwind, 
wind_GetCFA 

In addition, two data types are defined (_Unwind_Context and_Unwind_Exception 
) to interface a calling runtime (such as the C++ runtime) and the above rou- 
tine. All routines and interfaces behave as if defined extern "C". In particular, 
the names are not mangled. All names defined as part of this interface have a 
"_Unwind_” prefix. 

Lastly, a language and vendor specific personality routine will be stored by 
the compiler in the unwind descriptor for the stack frames requiring exception 
processing. The personality routine is called by the unwinder to handle language- 
specific tasks such as identifying the frame handling a particular exception. 


C 
oS o> 8 Sb Ss & BS Bo oS Bf 


'The overall structure and the external interface is derived from the IA-64 UNIX System V 
ABI 
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6.2.1 Exception Handler Framework 
Reasons for Unwinding 


There are two major reasons for unwinding the stack: 
e exceptions, as defined by languages that support them (such as C++) 
e “forced” unwinding (such as caused by Long jmp or thread termination) 


The interface described here tries to keep both similar. There is a major dif- 
ference, however. 


e In the case where an exception is thrown, the stack is unwound while the 
exception propagates, but it is expected that the personality routine for each 
stack frame knows whether it wants to catch the exception or pass it through. 
This choice is thus delegated to the personality routine, which is expected to 
act properly for any type of exception, whether “native” or “foreign”. Some 
guidelines for “acting properly” are given below. 


e During “forced unwinding”, on the other hand, an external agent is driving 
the unwinding. For instance, this can be the Long jmp routine. This exter- 
nal agent, not each personality routine, knows when to stop unwinding. The 
fact that a personality routine is not given a choice about whether unwinding 
will proceed is indicated by the _UA_FORCE_UNWIND flag. 


To accommodate these differences, two different routines are proposed. _Unwind_RaiseExcepti 
performs exception-style unwinding, under control of the personality routines. 
_Unwind_ForcedUnwind, on the other hand, performs unwinding, but gives 
an external agent the opportunity to intercept calls to the personality routine. This 
is done using a proxy personality routine, that intercepts calls to the personality 
routine, letting the external agent override the defaults of the stack frame’s per- 
sonality routine. 

As a consequence, it is not necessary for each personality routine to know 
about any of the possible external agents that may cause an unwind. For instance, 
the C++ personality routine need deal only with C++ exceptions (and possibly 
disguising foreign exceptions), but it does not need to know anything specific 
about unwinding done on behalf of Long jmp or pthreads cancellation. 
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The Unwind Process 


The standard ABI exception handling/unwind process begins with the raising of an 
exception, in one of the forms mentioned above. This call specifies an exception 
object and an exception class. 

The runtime framework then starts a two-phase process: 


e In the search phase, the framework repeatedly calls the personality routine, 
with the _UA_SEARCH_PHASE flag as described below, first for the cur- 
rent Srip and register state, and then unwinding a frame to a new Srip 
at each step, until the personality routine reports either success (a handler 
found in the queried frame) or failure (no handler) in all frames. It does not 
actually restore the unwound state, and the personality routine must access 
the state through the API. 


e If the search phase reports a failure, e.g. because no handler was found, it 
will call terminate () rather than commence phase 2. 


If the search phase reports success, the framework restarts in the cleanup 

phase. Again, it repeatedly calls the personality routine, with the _UA_CLEANUP_PHASE 
flag as described below, first for the current Srip and register state, and 

then unwinding a frame to a new %rip at each step, until it gets to the 

frame with an identified handler. At that point, it restores the register state, 

and control is transferred to the user landing pad code. 


Each of these two phases uses both the unwind library and the personality 
routines, since the validity of a given handler and the mechanism for transferring 
control to it are language-dependent, but the method of locating and restoring 
previous stack frames is language-independent. 

A two-phase exception-handling model is not strictly necessary to implement 
C++ language semantics, but it does provide some benefits. For example, the first 
phase allows an exception-handling mechanism to dismiss an exception before 
stack unwinding begins, which allows presumptive exception handling (correcting 
the exceptional condition and resuming execution at the point where it was raised). 
While C++ does not support presumptive exception handling, other languages do, 
and the two-phase model allows C++ to coexist with those languages on the stack. 

Note that even with a two-phase model, we may execute each of the two phases 
more than once for a single exception, as if the exception was being thrown more 
than once. For instance, since it is not possible to determine if a given catch clause 
will re-throw or not without executing it, the exception propagation effectively 
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stops at each catch clause, and if it needs to restart, restarts at phase 1. This 
process is not needed for destructors (cleanup code), so the phase | can safely 
process all destructor-only frames at once and stop at the next enclosing catch 
clause. 

For example, if the first two frames unwound contain only cleanup code, and 
the third frame contains a C++ catch clause, the personality routine in phase 1, 
does not indicate that it found a handler for the first two frames. It must do so for 
the third frame, because it is unknown how the exception will propagate out of 
this third frame, e.g. by re-throwing the exception or throwing a new one in C++. 

The API specified by the AMD64 psABI for implementing this framework is 
described in the following sections. 


6.2.2 Data Structures 
Reason Codes 


The unwind interface uses reason codes in several contexts to identify the reasons 
for failures or other actions, defined as follows: 
typedef enum { 

URC_NO_REASON = 0, 
URC_FOREIGN_EXCEPTION_ CAUGHT = 1, 
URC_FATAL PHASE2_ ERROR Ze, 
URC_FATAL PHASE1 ERROR Si 
URC_NORMAL_ STOP = 4, 
URC_END_OF_STACK = 5, 
URC_HANDLER_FOUND = 6, 
URC_INSTALL CONTEXT = 7, 
URC_CONTINUE_UNWIND = 8 

} _Unwind_Reason_Code; 
The interpretations of these codes are described below. 


Exception Header 


The unwind interface uses a pointer to an exception header object as its repre- 
sentation of an exception being thrown. In general, the full representation of an 
exception object is language- and implementation-specific, but is prefixed by a 
header understood by the unwind interface, defined as follows: 
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typedef void (*_Unwind_Exception_Cleanup_Fn) 
(_Unwind_Reason_Code reason, 


struct _Unwind_Exception *exc); 

struct _Unwind_Exception { 
uint64 exception_class; 
_Unwind_Exception_Cleanup_Fn exception_cleanup; 
uint 64 private_1l; 
uint 64 private_2; 


}} 


An _Unwind_Exception object must be eightbyte aligned. The first two 
fields are set by user code prior to raising the exception, and the latter two should 
never be touched except by the runtime. 

The exception_class field is a language- and implementation-specific 
identifier of the kind of exception. It allows a personality routine to distinguish 
between native and foreign exceptions, for example. By convention, the high 4 
bytes indicate the vendor (for instance AMD\0), and the low 4 bytes indicate the 
language. For the C++ ABI described in this document, the low four bytes are 
C++\0. 

The except ion_cleanup routine is called whenever an exception object 
needs to be destroyed by a different runtime than the runtime which created the 
exception object, for instance if a Java exception is caught by a C++ catch handler. 
In such a case, a reason code (see above) indicates why the exception object needs 
to be deleted: 


_URC_FOREIGN_EXCEPTION_CAUGHT = 1 This indicates that a different 
runtime caught this exception. Nested foreign exceptions, or re-throwing a 
foreign exception, result in undefined behavior. 


_URC_FATAL_PHASE1_ERROR = 3 The personality routine encountered an 
error during phase 1, other than the specific error codes defined. 


_URC_FATAL PHASE2 ERROR = 2 The personality routine encountered an 
error during phase 2, for instance a stack corruption. 


Normally, all errors should be reported during phase 1 by returning from 
_Unwind_RaiseException. However, landing pad code could cause stack 
corruption between phase | and phase 2. For a C++ exception, the runtime should 
call terminate () in that case. 
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The private unwinder state (private_1 and private_2) in an exception 
object should be neither read by nor written to by personality routines or other 
parts of the language-specific runtime. It is used by the specific implementation 
of the unwinder on the host to store internal information, for instance to remember 
the final handler frame between unwinding phases. 

In addition to the above information, a typical runtime such as the C++ run- 
time will add language-specific information used to process the exception. This 
is expected to be a contiguous area of memory after the _Unwind_Exception 
object, but this is not required as long as the matching personality routines know 
how to deal with it, and the except ion_cleanup routine de-allocates it prop- 
erly. 


Unwind Context 


The _Unwind_Context type is an opaque type used to refer to a system- 

specific data structure used by the system unwinder. This context is created and 

destroyed by the system, and passed to the personality routine during unwinding. 
struct _Unwind_Context 


6.2.3 Throwing an Exception 
_Unwind_RaiseException 


_Unwind_Reason_Code _Unwind_RaiseException 
( struct _Unwind_Exception *exception_object ); 

Raise an exception, passing along the given exception object, which should 
have its exception_class and exception_cleanup fields set. The ex- 
ception object has been allocated by the language-specific runtime, and has a 
language-specific format, except that it must contain an_Unwind_Exception 
struct (see Exception Header above). _Unwind_RaiseException does not 
return, unless an error condition is found (such as no handler for the exception, 
bad stack format, etc.). In such a case, an _Unwind_Reason_Code value is 
returned. 

Possibilities are: 


URC_END_OF_STACK The unwinder encountered the end of the stack during 
phase 1, without finding a handler. The unwind runtime will not have modi- 
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fied the stack. The C++ runtime will normally call uncaught_exception () 
in this case. 


_URC_FATAL_PHASE1_ERROR The unwinder encountered an unexpected er- 
ror during phase 1, e.g. stack corruption. The unwind runtime will not have 
modified the stack. The C++ runtime will normally call terminate () in 
this case. 


If the unwinder encounters an unexpected error during phase 2, it should re- 
turn _URC_FATAL_PHASE2_ERROR to its caller. In C++, this will usually be 
___cxa_throw, which will call terminate (). 

The unwind runtime will likely have modified the stack (e.g. popped frames 
from it) or register context, or landing pad code may have corrupted them. As a 
result, the the caller of _Unwind_RaiseException can make no assumptions 
about the state of its stack or registers. 


_Unwind_ForcedUnwind 


typedef _Unwind_Reason_Code (*«_Unwind_Stop_Fn) 
(int version, 

Unwind_Action actions, 

int64 exceptionClass, 

truct _Unwind_Exception *exceptionObject, 

truct _Unwind_Context *context, 


Uu 
Ss 
Ss 
Vv 


oid xstop_parameter ); 


_Unwind_Reason_Code_Unwind_ForcedUnwind 
( struct _Unwind_Exception *exception_object, 
_Unwind_Stop_Fn stop, 
void xstop_parameter ); 

Raise an exception for forced unwinding, passing along the given exception 
object, which should have its except ion_class and exception_cleanup 
fields set. The exception object has been allocated by the language-specific run- 
time, and has a language-specific format, except that it must containan_Unwind_Exception 
struct (see Exception Header above). 

Forced unwinding is a single-phase process (phase 2 of the normal exception- 
handling process). The stop and stop_parameter parameters control the 
termination of the unwind process, instead of the usual personality routine query. 
The stop function parameter is called for each unwind frame, with the pa- 
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rameters described for the usual personality routine below, plus an additional 
stop_parameter. 

When the st op function identifies the destination frame, it transfers control 
(according to its own, unspecified, conventions) to the user code as appropriate 
without returning, normally after calling _Unwind_DeleteException. If 
not, it should return an_Unwind_Reason_Code value as follows: 


_URC_NO_ REASON This is not the destination frame. The unwind runtime will 
call the frame’s personality routine with the _UA_FORCE_UNWIND and 
_UA_CLEANUP_PHASE flags set in actions, and then unwind to the next 
frame and call the stop function again. 


URC_END_OF_STACK In order to allow_Unwind_ForcedUnwinad to per- 
form special processing when it reaches the end of the stack, the unwind 
runtime will call it after the last frame is rejected, with a NULL stack pointer 
in the context, and the stop function must catch this condition (i.e. by notic- 
ing the NULL stack pointer). It may return this reason code if it cannot 
handle end-of-stack. 


_URC_FATAL_ PHASE2_ ERROR The stop function may return this code for 
other fatal conditions, e.g. stack corruption. 


If the stop function returns any reason code other than __URC_NO_REASON, 
the stack state is indeterminate from the point of view of the caller of 
_Unwind_ForcedUnwind. Rather than attempt to return, therefore, the un- 
wind library should return __URC_FATAL_PHASE2_ERROR to its caller. 


Example: longjmp_unwind () 

The expected implementation of longjmp_unwind() is as follows. The 
set jmp() routine will have saved the state to be restored in its custom- 
ary place, including the frame pointer. The longjmp_unwind() routine 
will call _Unwind_ForcedUnwind with a stop function that compares the 
frame pointer in the context record with the saved frame pointer. If equal, 
it will restore the set jmp () state as customary, and otherwise it will return 
_URC_NO_REASON or _URC_END_OF_STACK. 

If a future requirement for two-phase forced unwinding were identified, an al- 
ternate routine could be defined to request it, and an actions parameter flag defined 
to support it. 
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_Unwind_ Resume 


void _Unwind_Resume 
(struct _Unwind_Exception x*exception_object) ; 

Resume propagation of an existing exception e.g. after executing cleanup code 
in a partially unwound stack. A call to this routine is inserted at the end of a 
landing pad that performed cleanup, but did not resume normal execution. It 
causes unwinding to proceed further. 

_Unwind_Resume should not be used to implement re-throwing. To the 
unwinding runtime, the catch code that re-throws was a handler, and the previous 
unwinding session was terminated before entering it. Re-throwing is implemented 
by calling Unwind_RaiseException again with the same exception object. 

This is the only routine in the unwind library which is expected to be called 
directly by generated code: it will be called at the end of a landing pad in a 
"landing-pad" model. 


6.2.4 Exception Object Management 


_Unwind_DeleteException 


void _Unwind_DeleteException 
(struct _Unwind_Exception x*xexception_object) ; 
Deletes the given exception object. If a given runtime resumes normal execu- 
tion after catching a foreign exception, it will not know how to delete that excep- 
tion. Such an exception will be deleted by calling _Unwind_DeleteException. 
This is a convenience function that calls the function pointed to by the except ion_cleanup 
field of the exception header. 


6.2.5 Context Management 


These functions are used for communicating information about the unwind con- 
text (i.e. the unwind descriptors and the user register state) between the unwind 
library and the personality routine and landing pad. They include routines to read 
or set the context record images of registers in the stack frame corresponding to a 
given unwind context, and to identify the location of the current unwind descrip- 
tors and unwind frame. 
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_Unwind _GetGR 


uint64 _Unwind_GetGR 
(struct _Unwind_Context *context, int index); 
This function returns the 64-bit value of the given general register. The register 
is identified by its index as given in[3.36} 
During the two phases of unwinding, no registers have a guaranteed value. 


_Unwind_ SetGR 


void _Unwind_SetGR 
(struct _Unwind_Context x*context, 
int index, 
uint64 new_value) ; 

This function sets the 64-bit value of the given register, identified by its index 
as for_Unwind_GetGR. 

The behavior is guaranteed only if the function is called during phase 2 of 
unwinding, and applied to an unwind context representing a handler frame, for 
which the personality routine will return _URC_INSTALL_CONTEXT. In that 
case, only registers ¢rdi, ¢rsi, ¢rdx, rcx should be used. These scratch 
registers are reserved for passing arguments between the personality routine and 
the landing pads. 


_Unwind Get IP 


uint64 _Unwind_GetIP 
(struct _Unwind_Context x*context); 

This function returns the 64-bit value of the instruction pointer (IP). 

During unwinding, the value is guaranteed to be the address of the instruction 
immediately following the call site in the function identified by the unwind con- 
text. This value may be outside of the procedure fragment for a function call that 
is known to not return (such as _Unwind_Resume). 


_Unwind_ SetIP 


void _Unwind_SetIP 
(struct _Unwind_Context xcontext, 
uint64 new_value); 
This function sets the value of the instruction pointer (IP) for the routine iden- 
tified by the unwind context. 
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The behavior is guaranteed only when this function is called for an unwind 
context representing a handler frame, for which the personality routine will return 
_URC_INSTALL_CONTEXT. In this case, control will be transferred to the given 
address, which should be the address of a landing pad. 


_Unwind_GetLanguageSpecificData 


uint64 _Unwind_GetLanguageSpecificData 
(struct _Unwind_Context x*context); 
This routine returns the address of the language-specific data area for the cur- 
rent stack frame. 


This routine is not strictly required: it could be accessed through __Unwind_GetI 


using the documented format of the DWARF Call Frame Information Tables, but 
since this work has been done for finding the personality routine in the first place, 
it makes sense to cache the result in the context. We could also pass it as an 
argument to the personality routine. 


_Unwind_GetRegionStart 


uint64 _Unwind_GetRegionStart 
(struct _Unwind_Context *context); 

This routine returns the address of the beginning of the procedure or code 
fragment described by the current unwind descriptor block. 

This information is required to access any data stored relative to the beginning 
of the procedure fragment. For instance, a call site table might be stored relative 
to the beginning of the procedure fragment that contains the calls. During un- 
winding, the function returns the start of the procedure fragment containing the 
call site in the current stack frame. 


_Unwind GetCFA 


uint64 _Unwind_GetCFA 
(struct _Unwind_Context x*context); 

This function returns the 64-bit Canonical Frame Address which is defined as 
the value of rsp at the call site in the previous frame. This value is guaranteed 
to be correct any time the context has been passed to a personality routine or a 
stop function. 
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6.2.6 Personality Routine 


_Unwind_Reason_Code (*__personality_routine) 
(int version, 
_Unwind_Action actions, 
uint64 exceptionClass, 
struct _Unwind_Exception *exceptionObject, 
struct _Unwind_Context x*context); 

The personality routine is the function in the C++ (or other language) run- 
time library which serves as an interface between the system unwind library and 
language-specific exception handling semantics. It is specific to the code fragment 
described by an unwind info block, and it is always referenced via the pointer in 
the unwind info block, and hence it has no psABI-specified name. 


Parameters 


The personality routine parameters are as follows: 


version Version number of the unwinding runtime, used to detect a mis-match 
between the unwinder conventions and the personality routine, or to provide 
backward compatibility. For the conventions described in this document, 
version will be 1. 


actions Indicates what processing the personality routine is expected to per- 
form, as a bit mask. The possible actions are described below. 


exceptionClass An 8-byte identifier specifying the type of the thrown ex- 
ception. By convention, the high 4 bytes indicate the vendor (for instance 
AMD\0), and the low 4 bytes indicate the language. For the C++ ABI 
described in this document, the low four bytes are C++\0. This is not a 
null-terminated string. Some implementations may use no null bytes. 


exceptionObject The pointer to a memory location recording the necessary 
information for processing the exception according to the semantics of a 
given language (see the Exception Header section above). 


context Unwinder state information for use by the personality routine. This is 
an opaque handle used by the personality routine in particular to access the 
frame’s registers (see the Unwind Context section above). 
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return value The return value from the personality routine indicates how further 
unwind should happen, as well as possible error conditions. See the follow- 
ing section. 


Personality Routine Actions 


The actions argument to the personality routine is a bitwise OR of one or more of 
the following constants: 
typedef int _Unwind_Action,; 
_UA_SEARCH PHASE = 1; 

_UA_CLEANUP_PHASE = 2; 
_UA_HANDLER_FRAME = 4; 


Const 
CONnst 
Const 
Const 


_Unwind_Action 
_Unwind_Action 
_Unwind_Action 
_Unwind_Action 


_UA_FORCE_UNWI 


ND = 8; 


_UA_SEARCH_PHASE Indicates that the personality routine should check if the 


current frame contains a handler, and if so return _URC_HANDLER_FOUND, 


or otherwise return URC _CONTINUE_UNW 
cannot be set at the same time as_UA_CLEANUP_PHASE. 


NI] 


D. UA _SEARCH_PHASE 


_UA_CLEANUP_PHASE Indicates that the personality routine should perform 
cleanup for the current frame. The personality routine can perform this 
cleanup itself, by calling nested procedures, and return __URC_CONT INUE_UNW 
Alternatively, it can setup the registers (including the IP) for transferring 
control to a "landing pad", and return _URC_ 


IN 


STALL CONTEXT. 


_UA_HANDLER_FRAME During phase 2, indicates to the personality routine 
that the current frame is the one which was flagged as the handler frame 
during phase 1. The personality routine is not allowed to change its mind 
between phase 1 and phase 2, i.e. it must handle the exception in this frame 
in phase 2. 


_UA_FORCE_UNWIND During phase 2, indicates that no language is allowed 
to "catch" the exception. This flag is set while unwinding the stack for 
long jmp or during thread cancellation. User-defined code in a catch clause 
may still be executed, but the catch clause must resume unwinding with a 
call to_Unwind_Resume when finished. 
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Transferring Control to a Landing Pad 


If the personality routine determines that it should transfer control to a landing 
pad (in phase 2), it may set up registers (including IP) with suitable values for 
entering the landing pad (e.g. with landing pad parameters), by calling the context 
management routines above. It then returns _URC_INSTALL_CONTEXT. 

Prior to executing code in the landing pad, the unwind library restores registers 
not altered by the personality routine, using the context record, to their state in that 
frame before the call that threw the exception, as follows. All registers specified 
as callee-saved by the base ABI are restored, as well as scratch registers rdi, 
Srsi, srdx, rcx (see below). Except for those exceptions, scratch (or caller- 
saved) registers are not preserved, and their contents are undefined on transfer. 

The landing pad can either resume normal execution (as, for instance, at the 
end of a C++ catch), or resume unwinding by calling _Unwind_Resume and 
passing it the except ionOb ject argument received by the personality routine. 
_Unwind_Resume will never return. 

_Unwind_Resume should be called if and only if the personality routine 
did not return _Unwind_HANDLER_FOUND during phase 1. As a result, the 
unwinder can allocate resources (for instance memory) and keep track of them in 
the exception object reserved words. It should then free these resources before 
transferring control to the last (handler) landing pad. It does not need to free the 
resources before entering non-handler landing-pads, since _Unwind_Resume 
will ultimately be called. 

The landing pad may receive arguments from the runtime, typically passed 
in registers set using _Unwind_SetGR by the personality routine. For a landing 
pad that can call to_Unwind_Resume, one argument must be the except ionObject 
pointer, which must be preserved to be passed to_Unwind_Resume. 

The landing pad may receive other arguments, for instance a switch value 
indicating the type of the exception. Four scratch registers are reserved for this 
use (Srdi, rsi, ¢rdx, $rcx). 


Rules for Correct Inter-Language Operation 


The following rules must be observed for correct operation between languages 
and/or run times from different vendors: 

An exception which has an unknown class must not be altered by the personal- 
ity routine. The semantics of foreign exception processing depend on the language 
of the stack frame being unwound. This covers in particular how exceptions from 
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a foreign language are mapped to the native language in that frame. 

If a runtime resumes normal execution, and the caught exception was created 
by another runtime, it should call __Unwind_DeleteException. This is true 
even if it understands the exception object format (such as would be the case 
between different C++ run times). 

A runtime is not allowed to catch an exception if the _UA_FORCE_UNWIND 
flag was passed to the personality routine. 


Example: Foreign Exceptions in C++. In C++, foreign exceptions can be 
caught byacatch(...) statement. They can also be caught as if they were of a 
__foreign_exception class, defined in <exception>. The__foreign_exception 
may have subclasses, suchas__ java_exceptionand__ada_exception, 
if the runtime is capable of identifying some of the foreign languages. 

The behavior is undefined in the following cases: 


e A__foreign_exception catch argument is accessed in any way (in- 
cluding taking its address). 


e A__foreign_exception is active at the same time as another excep- 
tion (either there is a nested exception while catching the foreign exception, 
or the foreign exception was itself nested). 


e uncaught_exception(),set_terminate(),set_unexpected(), 
terminate (), or unexpected () is called at a time a foreign excep- 
tion exists (for example, calling set_terminate() during unwinding 
of a foreign exception). 


All these cases might involve accessing C++ specific content of the thrown 
exception, for instance to chain active exceptions. 
Otherwise, a catch block catching a foreign exception is allowed: 


e to resume normal execution, thereby stopping propagation of the foreign 
exception and deleting it, or 


e to re-throw the foreign exception. In that case, the original exception object 
must be unaltered by the C++ runtime. 


A catch-all block may be executed during forced unwinding. For instance, a 
longjmp may execute code ina catch(...) during stack unwinding. However, 
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if this happens, unwinding will proceed at the end of the catch-all block, whether 
or not there is an explicit re-throw. 

Setting the low 4 bytes of exception class to C++\0 is reserved for use by C++ 
run-times compatible with the common C++ ABI. 


6.3 Unwinding Through Assembler Code 


For successful unwinding on AMD64 every function must provide a valid de- 
bug information in the DWARF Debugging Information Format. In high level 
languages (e.g. C/C++, Fortran, Ada, ...) this information is generated by the 
compiler itself. However for hand-written assembly routines the debug info must 
be provided by the author of the code. To ease this task some new assembler 
directives are added: 


.cfi_startproc is used at the beginning of each function that should have 
an entry in .eh_frame . It initializes some internal data structures and 
emits architecture dependent initial CFI instructions. Each .cfi_startproc 
directive has to be closed by . cfi_endproc. 


.cfi_endproc is used at the end of a function where it closes its unwind en- 
try previously opened by .cfi_startproc and emits itto .eh_frame. 


.cfi_def_cfa REGISTER, OFFSET defines a rule for computing CFA 
as: take address from REGISTER and add OFFSET to it. 


.cfi_def_cfa_register REGISTER modifies arule for computing CFA. 
From now on REGISTER will be used instead of the old one. The offset 
remains the same. 


.cfi_def_cfa_offset OFFSET modifies a rule for computing CFA. The 
register remains the same, but OFFSET is new. Note that this is the absolute 
offset that will be added to a defined register to compute the CFA address. 


.cfi_adjust_cfa_offset OFFSET issimilarto .cfi_def_cfa_offset 
but OFFSET is a relative value that is added or subtracted from the previous 
offset. 


.cfi_offset REGISTER, OFFSET saves the previous value of REGIS- 
TER at offset OFFSET from CFA. 
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.cfi_rel_offset REGISTER, OFFSET saves the previous value of REG- 
ISTER at offset OFFSET from the current CFA register. This is transformed 
to .cfi_offset using the known displacement of the CFA register from 
the CFA. This is often easier to use, because the number will match the code 

it is annotating. 


.cfi_escape EXPRESSION[, ...] allows the user to add arbitrary bytes 
to the unwind info. One might use this to add OS-specific CFI opcodes, or 
generic CFI opcodes that the assembler does not support. 
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Figure 6.1: Examples for Unwinding in Assembler 


# — function with local variable allocated on the stack 


.type func_locvars, @function 
func_locvars: 
Jeti Stareproc 


# allocate space for local vars 

sub $0x1234, %Srsp 
.cfi_adjust_cfa_offset 0x1234 

# body 

# release space of local vars and return 
add $0x1234, %Srsp 
.cfi_adjust_cfa_offset -0x1234 

ret 


.cfi_endproc 


function that moves frame pointer to another register 
and then allocates space for local variables 
.type func_otherreg, @function 
func_otherreg: 
.cfi_startproc 
# save frame pointer to r12 


movq Srsp, sr12 
.cfi_def_cfa_register r12 


# allocate space for local vars 

# (no .cfi_{def,adjust}_cfa_offset needed here, 
# because CFA is computed from r12!) 

sub $100, Srsp 

# body 


# restore frame pointer from r12 


movq Sr1l2, %srsp 
.cfi_def_cfa_register rsp 
ret 


.cfi_endproc 
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Chapter 7 


Development Environment 


During compilation of C or C++ code at least the symbols in table{7.I]are defined 
by the pre-processor. 


Table 7.1: Predefined Pre-Processor Symbols 


__amd64 

—_amd64__ 

__x86_64 
x86_64 
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Chapter 8 


Execution Environment 


Not done yet. 
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Chapter 9 


Conventions 


'This chapter is used to document some features special to the AMD64 ABI. The different 
sections might be moved to another place or removed completely. 
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9.1 C++ 


For the C++ ABI we will use the IA-64 C++ ABI and instantiate it appropriately. 
The current draft of that ABI is available at: 
http://www. codesourcery.com/cxx-—abi/ 
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9.2 Fortran 


A formal Fortran ABI does not exist. Most Fortran compilers are designed for 
very specific high performance computing applications, so Fortran compilers use 
different passing conventions and memory layouts optimized for their specific 
purpose. For example, Fortran applications that must run on distributed memory 
machines need a different data representation for array descriptors (also known as 
dope vectors, or fat pointers) than applications running on symmetric multipro- 
cessor shared memory machines. A normative ABI for Fortran is therefore not 
desirable. However, for interoperability of different Fortran compilers, as well as 
for interoperability with other languages, this section provides some some guide- 
lines for data types representation, and argument passing. The guidelines in this 
section are derived from the GNU Fortran 77 (G77) compiler, and are also fol- 
lowed by the GNU Fortran 95 (gfortran) compiler (restricted to Fortran 77 
features). Other Fortran compilers already available for AMD64 at the time of 
this writing may use different conventions, so compatibility is not guaranteed. 

When this text uses the term Fortran procedure, the text applies to both For- 
tran FUNCTION and SUBROUTINE subprograms as well as for alternate ENTRY 
points, unless specifically stated otherwise. 

Everything not explicitely defined in this ABI is left to the implementation. 


9.2.1 Names 


External names in Fortran are names of entities visible to all subprograms at link 
time. This includes names of COMMON blocks and Fortran procedures. To avoid 
name space conflicts with linked-in libraries, all external names have to be man- 
gled. And to avoid name space conflicts of mangled external names with local 
names, all local names must also be mangled. The mangling scheme is straight- 
forward as follows: 


e all names that do not have any underscores in it should have one underscore 
appended 


e all external names containing one or more underscores in it (whereever) 
should have two underscores appended P| 


e all external names should be mapped to lower case, following the traditional 
UNIX model for Fortran compilers 


"Historically, this is to be compatible with f2c. 
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For examples see figure |9. I} 


Figure 9.1: Example mapping of names 


Fortran external name Linker name 


FOO foo_ 
foo foo_ 
Foo foo_ 
foo_ foo___ 
f_oo FOO. 


The entry point of the main program unit is called MAIN__. The symbol name 
for the blank common block is __BLNK__. the external name of the unnamed 
BLOCK DATA routineis_ BLOCK DATA _. 


9.2.2 Representation of Fortran Types 


For historical reasons, GNU Fortran 77 maps Fortran programs to the C ABI, so 
the data representation can be explained best by providing the mapping of Fortran 
types to C types used by G77 on AMD64)| as in figure The “TYPE«N” no- 
tation specifies that variables or aggregate members of type TYPE shall occupy N 
bytes of storage. 


Figure 9.2: Mapping of Fortran to C types 


Fortran Data kind Equivalent C type 
INTEGER*4 | Default integer signed int 
INTEGER*«8 | Double precision integer signed long 
REAL«4 Single precision FP number float 
REAL«8 Double precision FP number double 
COMPLEXx4 | Single precision complex FP number | complex float 
COMPLEXx8 | Double precision complex FP number | complex double 


LOGICAL Boolean logical type signed int 
CHARACTER | Text string char[] + length 


3G77 provides a header g2c . h with the equivalent C type definitions for all supported Fortran 
scalar types. 
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The values for type LOGICAL are . TRUE. implemented as 1 and . FALSE. 
implemented as 0. 

Data objects with a CHARACTER typ] are represented as an array of char- 
acters of the C char type (not guaranteed to be “\0” terminated) with a separate 
length counter to distinguish between CHARACTER data objects with a length pa- 
rameter, and aggregate types of CHARACTER data objects, possibly also with a 
length parameter. 

Layout of other aggregate types is implementation defined. GNU Fortran puts 
all arrays in contiguous memory in column-major order. GNU Fortran 95 builds 
an equivalent C struct for derived types without reordering the type fields. Other 
compilers may use other representations as needed. The representation and use of 
Fortran 90/95 array descriptors is implementation defined. Note that array indices 
start at 1 by default. 

Fortran 90/95 allow different kinds of each basic type using the kind type 
parameter of a type. Kind type parameter values are implementation defined. 

Layout of he commonly used Cray pointers is implementation defined. 


9.2.3 Argument Passing 


For each given Fortran 77 procedure, an equivalent C prototype can be derived. 
Once this equivalent C prototype is known, the C ABI conventions should be 
applied to determine how arguments are passed to the Fortran procedure. 

G77 passes all (user defined) formal arguments of a procedure by reference. 
Specifically, pointers to the location in memory of a variable, array, array element, 
a temporary location that holds the result of evaluating an expression or a tempo- 
rary or permanent location that holds the value of a constant (xf. g77 manual) 
are passed as actual arguments. Artificial compiler generated arguments may be 
passed by value or by reference as they are inherently compiler and hence imple- 
mentation specific. 

Data objects with a CHARACTER type are passed as a pointer to the charac- 
ter string and its length, so that each CHARACTER formal argument in a Fortran 
procedure results in two actual arguments in the equivalent C prototype. The first 
argument occupies the position in the formal argument list of the Fortran proce- 
dure. This argument is a pointer to the array of characters that make up the string, 
passed by the caller. The second argument is appended to the end of the user- 
specified formal argument list. This argument is of the default integer type and 


4This includes sub-strings. 
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its value is the length of the array of characters, that is the length, passed as the 
first argument. This length is passed by value. When more than one CHARACTER 
argument is present in an argument list, the length arguments are appended in the 
order the original arguments appear. The above discussion also applies to sub- 
strings. 

This ABI does not define the passing of optional arguments. They are allowed 
only in Fortran 90/95 and their passing is implementation defined. 

This ABI does not define array functions (function returning arrays). They are 
allowed only in Fortran 90/95 and requires the definition of array descriptors. 

Note that Fortran 90/95 procedure arguments with the INTENT (IN) attribute 
should also passed by reference if the procedure is to be linked with code written in 
Fortran 77. Fortran 77 does not and can not support the INTENT attribute because 
it has no concept of explicit interfaces. It is therefore not possible to declare the 
callee’s arguments as INTENT (IN). A Fortran 77 compiler must assume that all 
procedure arguments are INTENT (INOUT) in the Fortran 90/95 sense. 


9.2.4 Functions 


The calling of statement functions is implementation defined (as they are defined 
only locally, the compiler has the freedom to apply any calling convention it likes). 

Subroutines with alternate returns (e.g. "SUBROUTINE X(*,*)" called as 
"CALL X(*10,*20)") are implemented as functions returning an INTEGER of the 
default kind. The value of this returned integer is whatever integer is specified 
in the "RETURN" statement for the subroutine | or 0 for a RETURN statement 
without an argument. It is up to the caller to jump to the corresponding alternate 
return label. The actual alternate-return arguments are omitted from the calling 


sequence. 
An example: 
SUBROUTINE SHOW_ALTERNATE RETURN (N) 

INTEGER N 
CALL ALTERNATE _RETURN_EXAMPLE (N, *10, +*20, x*30) 
WRITE (*«,*) ’OK — Normal Return’ 
RETURN 

10 WRITE (x*,*) ’l1lst alternate return’ 
RETURN 

20 WRITE (*,*) ‘2nd alternate return’ 


> This integer indicates the position of an alternate return from the subroutine in the formal 
argument list 
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RETURN 
30 WRITE (*,*) ’2nd alternate return’ 

RETURN 

END 

SUBROUTINE ALTERNATE_RETURN_EXAMPLE (N, *, *, *) 
INTEGER N 
IF (N .EQ. O ) RETURN ! Implicit "RETURN 0" 
IF ( N .EQ 1 ) RETURN 1 
IF (N .EQ. 2 ) RETURN 2 
RETURN 3 

END 


Here the SUBROUTINE ALTERNATE_RETURN_EXAMPLE is implemented 
as a function returning an INTEGER*4 with value 0 if Nis 0, lif Nis 1,2 if N 
is 2 and 3 for all other values of N. This return value is used by the caller as if the 
actual call were replaced by this sequence: 


INTEGER X 
X = CALL ALTERNATE_RETURN_EXAMPLE (N) 
GOTO (10, 20, 30), X 


All in all the effect is that the index of the returned to label (starting from 1) 
will be contained in %rax after the call. 

Alternate ENTRY points of a SUBROUTINE or FUNCTION should be treated 
as separate subprograms, as mandated by the Fortran standard. Ie. arguments 
passed to an alternate ENTRY should be passed as if the alternate ENTRY is a sep- 
arate SUBROUTINE or FUNCTION. If a FUNCTION has alternate ENTRY points, 
the result of each of the alternate ENTRY points must be returned as if the alternate 
ENTRY is a separate FUNCTION with the result type of the alternate ENTRY. The 
external naming of alternate ENTRY points follows/|9.2.1 


9.2.5 COMMON blocks 


In absence of any EQUIVALENCE declaration involving variables in COMMON 
blocks the layout of a COMMON block is exactly the same as the layout of the 
equivalent C structure (with types of variables substituted according to section 
(9.2.2), including the alignment requirements. 

This ABI defines the layout under presence of EQUIVALENCE statements 
only in some cases: 


e the layout of the COMMON block must not change if one ignores the EQUIVALENCE, 
which amongst other things means: 
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e Iftwo arrays are equivalenced, the larger array must be named in the COMMON 
block, and there must be complete inclusion, in particular the other array 
may not extend the size of the equivalenced segment. It may also not change 
the alignment requirement. 


e If an array element and a scalar are equivalenced, the array must be named 
in the COMMON block and it must not be smaller than the scalar. The type of 
the scalar must not require bigger alignment than the array. 


e if two scalars are equivalenced they must have the same size and alignment 
requirements. 


Other cases are implementation defined. 

Because the Fortran standard allows the blank COMMON block to have different 
sizes in different subprograms, it may be impossible to determine if it is small 
enough to fit in the . bss section. When compiling for the medium or large code 
models the blank COMMON block should therefore always be put in the .1lbss 
section. 


9.2.6 Intrinsics 


This sections lists the set of intrinsics which has to be supported at minimum by 
a conforming compiler. They are separated by origin. They follow regular calling 
and naming conventions. 


The signature of intrinsics uses the syntax return—type(argtypel, argtype?, ... 


where the individual types can be the following characters: V (as in void) des- 
ignates a SUBROUTINE, L a LOGICAL, I an INTEGER, R a REAL, and Ca 
CHARACTER. Hence I (R,L) designates a FUNCTION returning an INTEGER 
and taking a REAL and a LOGICAL. If an argument is an array, this is indicated 
using a trailing number, e.g. I13 is an INTEGER array with 13 elements. If a 
CHARACTER argument or return value has a fixed length, this is indicated using 
an asterisk and a trailing number, for example C*16 is a CHARACTER (len=16). 
Ifa CHARACTER argument of arbitrary length must be passed, the trailing number 
is replaced with N, for example C*N. 
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Table 9.1: Mil intrinsics 


Name Signature Meaning 
BTest | LGD Test bit 
TAnd 1d,D Boolean AND 
TOr 1,D Boolean OR 
TEOr 1d,D Boolean XOR 
Not I) Boolean NOT 
IBClr | Id,D Clear a bit 
IBits ULL) Extract a bit subfield of a variable 
TBSet | Id,D Set a bit 
Ishft | Id,D Logical bit shift 
TShftc | Id,LD Circular bit shift 
MvBits | V¢LLLLD | Move a bit field 


BTest (I, Pos) Returns . TRUE. if bit Pos in I is set, returns .FALSE. oth- 
erwise. 


IAnd (I, J) Returns value resulting from a boolean AND on each pair of bits 
in I and J. 


Or (I, J) Returns value resulting from a boolean OR on each pair of bits in 
I and J. 


EOr (I, J) Returns value resulting from a boolean XOR on each pair of bits 
in I and J. 


Not (1) Returns value resulting from a boolean NOT on each bit in I. 


IBClr (I, Pos) Returns the value of I with bit Pos cleared (set to zero). 


IBits (I, Pos, Len) Extracts a subfield starting from bit position Pos and 
with a length (towards the most significant bit) of Len bits from I. The 
result is right-justified and the remaining bits are zeroed. 


IBSet (I, Pos) Returns the value of I with the bit in position Pos set to one. 
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IShft (1, Shift) All bits of I are shifted Shift places. Shift .GT.0 in- 
dicates a left shift, Shift..EQ.0O indicates no shift, and Shift. LT.0 
indicates a right shift. Bits shifted out from the least (when shifting right) 
or most (when shifting left) significant position are lost. Bits shifted in at 
the opposite end are not set (i.e. zero). 


IShftc (I, Shift, Size) The rightmost Size bits of the argument I are 
shifted circularly Shift places. The unshifted bits of the result are the 
same as the unshifted bits of I. 


MvBits (From, FromPos, Len, To, ToPos) Move Len bits of From from 
bit positions FromPos through FromPos+Len~1 to bit positions ToPos 
through ToPos+Len-~—1 of To. The bit portions of To that are not affected 
by the movement of bits are unchanged. 
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Table 9.2: F77 intrinsics 


Name Meaning 
Abs Absolute value 

ACos_ | Arc cosine 

Alnt Truncate to whole number 

ANInt | Round to nearest whole number 

ASin | Arc sine 

ATan_ | Arc Tangent 

ATan2 | Arc Tangent 

Char | Character from code 

Cmplx | Construct COMPLEX (KIND=1) value 
Conjg | Complex conjugate 

Cos Cosine 

CosH_ | Hyperbolic cosine 

Dble Convert to double precision 

DiM Difference magnitude (non-negative subtract) 
DProd | Double-precision product 

Exp Exponential 

IChar | Code for character 

Index | Locate a CHARACTER substring 

Int Convert to INTEGER value truncated to whole number 
Len Length of character entity 

LGe Lexically greater than or equal 

LGt Lexically greater than 

LLe Lexically less than or equal 

LLt Lexically less than 

Log Natural logarithm 

Log10 | Common logarithm 

Max Maximum value 

Min Minimum value 

Mod Remainder 

NInt Convert to INTEGER value rounded to nearest whole number 
Real Convert value to type REAL (KIND=1) 
Sin Sine 

SinH | Hyperbolic sine 

SqRt | Square root 

Tan Tangent 

TanH_ | Hyperbolic tangent 115 
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Refer to the Fortran 77 language standard for signature and definition of the 
F77 intrinsics listed in table These intrinsics can have a prefix as per the 
standard hence the table is not exhaustive. 


Table 9.3: F90 intrinsics 


Name Meaning 
AChar ASCII character from code 
Bit_Size Number of bits in arguments type 
CPU_Time Get current CPU time 
TAChar ASCII code for character 
Len_Trim Get last non-blank character in string 
System_Clock | Get current system clock value 


Refer to the Fortran 90 language standard for signature and definition of the 
F90 intrinsics listed in table 


Table 9.4: Math intrinsics 


Name Signature Meaning 
BesJO | R(R) Bessel function 
BesJ1 | R(R) Bessel function 
BesJN | R(,R) Bessel function 
BesYO | R(R) Bessel function 
BesY1 | R(R) Bessel function 
BesYN | RC,R) Bessel function 
ErF R(R) Error function 
ErFC | R(R) Complementary error function 

[Rand | I) Random number 
Rand | Rd) Random number 
SRand | Vd) Random seed 


BesJO (xX) Calculates the Bessel function of the first kind of order 0 of X. Returns 


a REAL of the same kind as X. 
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BesJl 


BesJN 


BesYO 


BesYl 


BesYN 


Erk 


ErFC 


TRand 


Rand 


SRand 


(X) Calculates the Bessel function of the first kind of order 1 of X. Returns 
a REAL of the same kind as X. 


(N, X) Calculates the Bessel function of the first kind of order N of X. 
Returns a REAL of the same kind as X. 


(X) Calculates the Bessel function of the second kind of order 0 of X. 
Returns a REAL of the same kind as X. 


(X) Calculates the Bessel function of the second kind of order 1 of X. 
Returns a REAL of the same kind as X. 


(N, X) Calculates the Bessel function of the second kind of order N of X. 
Returns a REAL of the same kind as X. 


(X) Calculates the error function of X. Returns a REAL of the same kind 
as X. 


(X) Calculates the complementary error function of X,i.e.1 — ERF (X). 
Returns a REAL of the same kind as X. 


(Flag) Flag is optional. Returns a uniform quasi-random number up to 
a system-dependent limit. If Flag .EQ. 0 or Flag is not passed, the 
next number in sequence is returned. If Flag .EQ. 1, the generator is 
restarted. If Flag has any other value, the generator is restarted with the 
value of Flag as the new seed. 


(Flag) Flag is optional. Returns a uniform quasi-random number between 
OQ and 1. If Flag .EQ. O or Flag is not passed, the next number in 
sequence is returned. IfFlag .EQ. 1, the generator is restarted. If Flag 
has any other value, the generator is restarted with the value of Flag as the 
new seed. 


(Seed) Reinitializes the random number generator for TRand and Rand 
with the seed in Seed. 
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Table 9.5: Unix intrinsics 


Name Signature Meaning 

Abort | VQ Abort the program 

Access | I(C,C) Check file accessibility 

DTime | V(R2,R) Get elapsed time since last call 

ETime | V(R2,R) Get elapsed time for process 

Flush | Vd) Flush buffered output 

FNum I(1) Get file descriptor from Fortran unit number 

FStat V3.) Get file information 

GError | V(C*N) Get error message for last error 

GetArg | V(I,C*N) Obtain command-line argument 

GetCwD | V(C*N,D) Get current working directory 

GetEnv | V(C*N,C*N) | Get environment variable 

GetGId | IQ Get process group ID 

GetPId | IQ Get process ID 

GetUId | IQ Get process user ID 

GetLog | V(C*N) Get login name 

HostNm | V(C*N,D Get host name 

TArgc | 10 Obtain count of command-line arguments 

IDate | V3) Get local date info 

IErrNo | IQ Get error number for last error 

ITime | V(I3) Get local time of day 

LStat V(C*N,113,D | Get file information 

PError | V(C*N) Print error message for last error 

Rename | V(C*N,C*N,D) | Rename file 

Sleep | V(1) Sleep for a specified time 

System | V(C*N,D Invoke shell (system) command 
Abort () Prints a message and potentially causes a core dump. 


Access (Name, Mode) Checks file Name for accessibility in the mode specified 
by Mode. Returns 0 if the file is accessible in that mode, otherwise an er- 
ror code. Name must be a NULL-terminated string of CHARACTER (ie. 
a C-style string). Trailing blanks in Name are ignored. Mode must be a 
concatenation of any of the following characters: r meaning test for read 
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DTime 


ETime 


Flush 


FNum 


FStat 


permission, w meaning test for write permission, x meaning test for exe- 
cute/search permission, or a space meaning test for existence of the file. 


(TArray, Result) When called for the first time, returns the num- 
ber of seconds of runtime since the start of the program in Result, the 
user component of this runtime in TArray (1), and the system time in 
TArray (2). Subsequent invocations values based on accumulations since 
the previous invocation. 


(TArray, Result) Returns the number of seconds of runtime since 
the start of the program in Result, the user component of this runtime 
in TArray (1), and the system time in TArray (2) . Subsequent invoca- 
tions values based on accumulations since the previous invocation. 


(Unit) Flushes the Fortran I/O unit with ID Unit. The unit must be 
open for output. If the optional Unit argument is omitted, all open units 
are flushed. 


(Unit) Returns the UNIX(tm) file descriptor number corresponding to the 
Fortran I/O unit Unit. The unit must be open. 


(Unit, SArray, Status) Obtains data about the file open on For- 
tran I/O unit Unit and places it in the array SArray. The values in this 
array are as follows: 

. Device ID 

. Inode number 

. File mode 

. Number of links 

. Owner’s UID 

Owner’s GID 

ID of device containing directory entry for file 


File size (bytes) 


Cm NDA R WN = 


Last access time 


— 
= 


Last modification time 


— 
— 


. Last file status change time 
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12. Preferred I/O block size (-1 if not available) 
13. Number of blocks allocated (-1 if not available) 


If an element is not available, or not relevant on the host system, it is re- 
turned as 0 except when indicated otherwise in the above list. If the optional 
Status argument is supplied, it contains 0 on success or a nonzero error 
code upon return. 


Gerror (Message) Returns the system error message corresponding to the last 
system error (errno in C). The message is returned in Message. IfMessag 


is longer than the error message, it is padded with blanks after the message. 
If Message is not long enough to hold the error message, the error message 
is truncated to the length of Message. 


GetArg (Pos, Value) Returns in Value the command-line argument in posi- 
tion Pos. If there are fever than Pos command-line arguments, Value 
is filled with blanks. If Pos is 0, the name of the program is returned. If 
Value is longer than the command-line argument, it is padded with blanks 
after the argument. If Value is not long enough to hold the command-line 
argument, the argument is truncated to the length of Value. 


GetCWD (Name, Status) Returns in Name the current working directory. If 
the optional Status argument is supplied, it contains 0 on success or a 
nonzero error code upon return. 


GetEnv (Name, Value) Returns in Value the environment variable identified 
with Name. If Name has not been set, Value is filled with blanks. A null 
character marks the end of the name in Name. Trailing blanks in Name are 
ignored. If Value is longer than the environment variable, it is padded 
with blanks after the variable. If Value is not long enough to hold the 
environment variable, the variable is truncated to the length of Value. 


GetGId () Returns the group ID for the current process. 


GetPId () Returns the process ID for the current process. 


GetUId () Returns the user ID for the current process. 


GetLog (Login) Returns the login name for the process in Login, or a blank 
string if the host system does not support get login(3). If Login is 
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HotNm 


IArgC 


IDate 


ITErrno 


ITime 


LStat 


longer than the login name, it is padded with blanks after the login name. 
If Login is not long enough to hold the login name, the login name is 
truncated to the length of of Login. 


(Name, Status) Returns in Name system’s host name. If the optional 
Status argument is supplied, it contains 0 on success or a nonzero error 
code upon return. If Name is longer than the host name, it is padded with 
blanks after the host name. If Name is not long enough to hold the host 
name, the host name is truncated to the length of of Name. 


() Returns the number of command-line arguments. The program name 
itself is not included in this number. 


(TArray) Returns the current local date day, month, year in elements 1, 
2, and 3 of Tarray, respectively. The year has four significant digits. 


() Returns the last system error number (errno in C). 


(TArray) Retums the current local time hour, minutes, and seconds in 
elements 1, 2, and 3 of TArray, respectively. 


(File, SArray, Status) Obtains data about a file named File and 
places places it in the array SArray. The values in this array are as follows: 
Device ID 
Inode number 
File mode 
Number of links 
Owner’s UID 
Owner’s GID 
ID of device containing directory entry for file 


File size (bytes) 


Oe TOP eel ON I Te a 


Last access time 


— 
a 


Last modification time 


—_ 
— 


. Last file status change time 
Preferred I/O block size (-1 if not available) 


— 
N 
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PError 


Rename 


Sleep 


system 


13. Number of blocks allocated (-1 if not available) 


If an element is not available, or not relevant on the host system, it is re- 
turned as 0 except when indicated otherwise in the above list. If the optional 
Status argument is supplied, it contains 0 on success or a nonzero error 
code upon return. 


(MsgPrefix) Prints a newline-terminated error message corresponding 
to the last system error. This is prefixed by the string MsgPrefix, acolon 
and a space. The error message is printed on the C stderr stream. 


(Pathl, Path2, Status) Renames the filenamed Pathl toPath2. 
A null character marks the end of the names. Trailing blanks are ignored. 
If the optional Status argument is supplied, it contains 0 on success or a 
nonzero error code upon return. 


(Seconds) Causes the program to pause for Seconds seconds. 


(Command, Status) Passes the string in Command to a shell though 
system (3). If the optional argument St at us is present, it contains the 
value returned by system(3). 
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Appendix A 


Linux Conventions 


This chapter describes some details that are only relevant to GNU/Linux systems 
and the Linux kernel. 


A.1_ Execution of 32-bit Programs 


The AMD64 processors are able to execute 64-bit AMD64 and also 32-bit ia32 
programs. Libraries conforming to the Intel386 ABI will live in the normal places 
like /lib, /usr/lib and /usr/bin. Libraries following the AMD64, will 
use 1ib64 subdirectories for the libraries, e.g /1ib64 and /usr/1ib64. Pro- 
grams conforming to Intel386 ABI and to the AMD64 ABI will share directories 
like /usr/bin. In particular, there will be no /bin64 directory. 


A.2.) AMD64 Linux Kernel Conventions 


The section is informative only. 


A.2.1 Calling Conventions 


The Linux AMD64 kernel uses internally the same calling conventions as user- 
level applications (see section [3.2.3] for details). User-level applications that like 
to call system calls should use the functions from the C library. The interface 
between the C library and the Linux kernel is the same as for the user-level appli- 
cations with the following differences: 
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1. User-level applications use as integer registers for passing the sequence 
Srdi, *rsi, *rdx, $rcx, 3r8 and $r9. The kernel interface uses 3rdi, 
Srsi, srdx, $r10, $r8 and $r9. 


2. A system-call is done via the syscall instruction. The kernel destroys 
registers $rcx and $r11. 


3. The number of the syscall has to be passed in register Srax. 


4. System-calls are limited to six arguments, no argument is passed directly on 
the stack. 


5. Returning from the syscall, register Srax contains the result of the 
system-call. A value in the range between -4095 and -1 indicates an error, 
itis -errno. 


6. Only values of class INTEGER or class MEMORY are passed to the kernel. 


A.2.2 Stack Layout 


The Linux kernel does not honor the red zone (see and therefore this area is 
not allowed to be used by kernel code. Kernel code should be compiled by GCC 
with the option -mno-red-zone. 


A.2.3 Required Processor Features 


Any program or kernel can expect that a AMD64 processor implements the fea- 
tures mentioned in table In general a program has to check itself whether 
those features are available but for AMD64 systems, these should always be avail- 
able. Table uses the names for the processor features as documented in the 
processor manual. 


A.2.4 Miscellaneous Remarks 


Linux Kernel code is not allowed to change the x87 and SSE units. If those are 
changed by kernel code, they have to be restored properly before sleeping or leav- 
ing the kernel. On preemptive kernels also more precautions may be needed. 
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Table A.1: Required Processor Features 


Feature | Comment 

Features need for programs 
fpu Necessary for long double, MMX 
tsc User-visible 
cx8 User-visible 
cmov User-visible 
mmx User-visible 
sse User-visible, required for float 
sse2 User-visible, required for double 
fxsr Required for SSE/SSE2 
syscall | For calling the kernel 

Features need in the kernel 
pae This kind of page tables is used 
pse PAE needs PSE. 
msr At least needed to enter long mode 
pge Kernel optimization 
pat Kernel optimization 
clflush | Kernel optimization 
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Index 


, cfi_adjust_cfa_offset,|96] 
.cfi_def_cfa,[94 
.cfi_def_cfa_offset,06 

3 cfi_def_cfa_register,|94 
< cfi_endproc,|96 
.cfi_escape,|97| 
.cfi_offset,|96] 
.cfi_rel_offset,|97| 
.cfi_startproc, [94 
.eh_frame, |96] 

%rax,|46| 

A_CLEANUP_PHASE, 
A_FORCE_UNWIND, 
A_SEARCH_PHASE, 


U 


U 


U 


U 


U 


oS & fs 5 


wind_Context,|[8]] 
wind_DeleteException, 
win d_Exception,]] 

win d_ForcedUnwind, [81] [82] 
wind_GetCFA,|8]] 
wind_GetGR,|8]] 

wind_Get IP,|8]] 


boolean, 9] 
byte, 


C++,|102 
Call Frame Information tables, 
code models, 


double quadword, 
doubleword, 


DT_FINI,|7/9| 
DT_FINI_ARRAY,|/9 
DT_INIT,|79 


DT_INIT_ARRAY, 

DT_JMPREL, 
DT_PREINIT_ARRAY, 

DWARF Debugging Information Format, 


eightbyte, 
exceptions, 
exec, 


p£eget round, 80] 


wind_GetLanguageSpecificl 
wind_GetRegionStart, [i] 
win d_RaiseExcept ion, [81][82| 
wind_Resume,|81] 
wind_SetGR,[8]] 
wind_SetIP,|8]] 


—__float128, 
auxiliary vector, 


fourbyte, 

global offset table, 
halfword, [7] 

Kernel code model, 
Large code model, 
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Large position independent code model, 
Long jmp, 


Medium code model, 
Medium position independent code model, 


PIC, [30}/31] 
Procedure Linkage Table, 


procedure linkage table, 
program interpreter, 


quadword, 


R_X86_64_JUMP_SLOT,[75}[76] 
R_X86_64_TLSDESC,|76} 


red zone, 
register save area, |47| 


signal, 

sixteenbyte, 

size_t, (9 

Small code model, [29] 

Small position independent code model, 


terminate (), 
Thread-Local Storage, 


twobyte, 
Unwind Library interface, 


va_arg,|49| 
va_list, 
va_start, 49] 


word, 
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